All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-12 12:07 ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Andrew Jones, Christoffer Dall, Shih-Wei Li, kvm

This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems.  The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor.  This has
the benefit of lower latency when waiting for interrupts and delivering
virtual interrupts, and reduces the overhead of emulating behavior and
I/O in the host kernel.

Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
that can be generally improved.  We then add infrastructure to move more
logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
registers.

We then introduce a new world-switch function for VHE systems, which we
can tweak and optimize for VHE systems.  To do that, we rework a lot of
the system register save/restore handling and emulation code that may
need access to system registers, so that we can defer as many system
register save/restore operations to vcpu_load and vcpu_put, and move
this logic out of the VHE world switch function.

We then optimize the configuration of traps.  On non-VHE systems, both
the host and VM kernels run in EL1, but because the host kernel should
have full access to the underlying hardware, but the VM kernel should
not, we essentially make the host kernel more privileged than the VM
kernel despite them both running at the same privilege level by enabling
VE traps when entering the VM and disabling those traps when exiting the
VM.  On VHE systems, the host kernel runs in EL2 and has full access to
the hardware (as much as allowed by secure side software), and is
unaffected by the trap configuration.  That means we can configure the
traps for VMs running in EL1 once, and don't have to switch them on and
off for every entry/exit to/from the VM.

Finally, we improve our VGIC handling by moving all save/restore logic
out of the VHE world-switch, and we make it possible to truly only
evaluate if the AP list is empty and not do *any* VGIC work if that is
the case, and only do the minimal amount of work required in the course
of the VGIC processing when we have virtual interrupts in flight.

The patches are based on v4.15-rc3, v9 of the level-triggered mapped
interrupts support series [1], and the first five patches of James' SDEI
series [2].

I've given the patches a fair amount of testing on Thunder-X, Mustang,
Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
functionality on the Foundation model, running both 64-bit VMs and
32-bit VMs side-by-side and using both GICv3-on-GICv3 and
GICv2-on-GICv3.

The patches are also available in the vhe-optimize-v3 branch on my
kernel.org repository [3].  The vhe-optimize-v3-base branch contains
prerequisites of this series.

Changes since v2:
 - Rebased on v4.15-rc3.
 - Includes two additional patches that only does vcpu_load after
   kvm_vcpu_first_run_init and only for KVM_RUN.
 - Addressed review comments from v2 (detailed changelogs are in the
   individual patches).

Thanks,
-Christoffer

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
[2]: git://linux-arm.org/linux-jm.git sdei/v5/base
[3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3

Christoffer Dall (40):
  KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
  KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm64: Rework hyp_panic for VHE and non-VHE
  KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
  KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
  KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  KVM: arm64: Defer restoring host VFP state to vcpu_put
  KVM: arm64: Move debug dirty flag calculation out of world switch
  KVM: arm64: Slightly improve debug save/restore functions
  KVM: arm64: Improve debug register save/restore flow
  KVM: arm64: Factor out fault info population and gic workarounds
  KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  KVM: arm64: Don't deactivate VM on VHE systems
  KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  KVM: arm64: Move userspace system registers into separate function
  KVM: arm64: Rewrite sysreg alternatives to static keys
  KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
    functions
  KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
  KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
  KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
  KVM: arm64: Change 32-bit handling of VM system registers
  KVM: arm64: Rewrite system register accessors to read/write functions
  KVM: arm64: Introduce framework for accessing deferred sysregs
  KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
  KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
  KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
    VHE
  KVM: arm64: Prepare to handle deferred save/restore of 32-bit
    registers
  KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
  KVM: arm64: Move common VHE/non-VHE trap config in separate functions
  KVM: arm64: Configure FPSIMD traps on vcpu load/put
  KVM: arm64: Configure c15, PMU, and debug register traps on cpu
    load/put for VHE
  KVM: arm64: Separate activate_traps and deactive_traps for VHE and
    non-VHE
  KVM: arm/arm64: Get rid of vgic_elrsr
  KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
  KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
  KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
    VHE
  KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
  KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs

Shih-Wei Li (1):
  KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag

 arch/arm/include/asm/kvm_asm.h                    |   5 +-
 arch/arm/include/asm/kvm_emulate.h                |  21 +-
 arch/arm/include/asm/kvm_host.h                   |   6 +-
 arch/arm/include/asm/kvm_hyp.h                    |   4 +
 arch/arm/kvm/emulate.c                            |   4 +-
 arch/arm/kvm/hyp/Makefile                         |   1 -
 arch/arm/kvm/hyp/switch.c                         |  16 +-
 arch/arm64/include/asm/kvm_arm.h                  |   4 +-
 arch/arm64/include/asm/kvm_asm.h                  |  18 +-
 arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
 arch/arm64/include/asm/kvm_host.h                 |  49 ++-
 arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
 arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
 arch/arm64/kernel/asm-offsets.c                   |   2 +
 arch/arm64/kvm/debug.c                            |  28 +-
 arch/arm64/kvm/guest.c                            |   3 -
 arch/arm64/kvm/hyp/Makefile                       |   2 +-
 arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
 arch/arm64/kvm/hyp/entry.S                        |   9 +-
 arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
 arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
 arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
 {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
 arch/arm64/kvm/inject_fault.c                     |  24 +-
 arch/arm64/kvm/regmap.c                           |  65 +++-
 arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
 arch/arm64/kvm/sys_regs.h                         |   4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
 include/kvm/arm_vgic.h                            |   2 -
 virt/kvm/arm/aarch32.c                            |   2 +-
 virt/kvm/arm/arch_timer.c                         |   7 -
 virt/kvm/arm/arm.c                                |  50 ++-
 virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
 virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
 virt/kvm/arm/mmu.c                                |   6 +-
 virt/kvm/arm/pmu.c                                |  37 +-
 virt/kvm/arm/vgic/vgic-init.c                     |  11 -
 virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
 virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
 virt/kvm/arm/vgic/vgic.c                          |  21 ++
 virt/kvm/arm/vgic/vgic.h                          |   3 +
 41 files changed, 1229 insertions(+), 701 deletions(-)
 rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-12 12:07 ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems.  The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor.  This has
the benefit of lower latency when waiting for interrupts and delivering
virtual interrupts, and reduces the overhead of emulating behavior and
I/O in the host kernel.

Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
that can be generally improved.  We then add infrastructure to move more
logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
registers.

We then introduce a new world-switch function for VHE systems, which we
can tweak and optimize for VHE systems.  To do that, we rework a lot of
the system register save/restore handling and emulation code that may
need access to system registers, so that we can defer as many system
register save/restore operations to vcpu_load and vcpu_put, and move
this logic out of the VHE world switch function.

We then optimize the configuration of traps.  On non-VHE systems, both
the host and VM kernels run in EL1, but because the host kernel should
have full access to the underlying hardware, but the VM kernel should
not, we essentially make the host kernel more privileged than the VM
kernel despite them both running at the same privilege level by enabling
VE traps when entering the VM and disabling those traps when exiting the
VM.  On VHE systems, the host kernel runs in EL2 and has full access to
the hardware (as much as allowed by secure side software), and is
unaffected by the trap configuration.  That means we can configure the
traps for VMs running in EL1 once, and don't have to switch them on and
off for every entry/exit to/from the VM.

Finally, we improve our VGIC handling by moving all save/restore logic
out of the VHE world-switch, and we make it possible to truly only
evaluate if the AP list is empty and not do *any* VGIC work if that is
the case, and only do the minimal amount of work required in the course
of the VGIC processing when we have virtual interrupts in flight.

The patches are based on v4.15-rc3, v9 of the level-triggered mapped
interrupts support series [1], and the first five patches of James' SDEI
series [2].

I've given the patches a fair amount of testing on Thunder-X, Mustang,
Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
functionality on the Foundation model, running both 64-bit VMs and
32-bit VMs side-by-side and using both GICv3-on-GICv3 and
GICv2-on-GICv3.

The patches are also available in the vhe-optimize-v3 branch on my
kernel.org repository [3].  The vhe-optimize-v3-base branch contains
prerequisites of this series.

Changes since v2:
 - Rebased on v4.15-rc3.
 - Includes two additional patches that only does vcpu_load after
   kvm_vcpu_first_run_init and only for KVM_RUN.
 - Addressed review comments from v2 (detailed changelogs are in the
   individual patches).

Thanks,
-Christoffer

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
[2]: git://linux-arm.org/linux-jm.git sdei/v5/base
[3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3

Christoffer Dall (40):
  KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
  KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm64: Rework hyp_panic for VHE and non-VHE
  KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
  KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
  KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  KVM: arm64: Defer restoring host VFP state to vcpu_put
  KVM: arm64: Move debug dirty flag calculation out of world switch
  KVM: arm64: Slightly improve debug save/restore functions
  KVM: arm64: Improve debug register save/restore flow
  KVM: arm64: Factor out fault info population and gic workarounds
  KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  KVM: arm64: Don't deactivate VM on VHE systems
  KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  KVM: arm64: Move userspace system registers into separate function
  KVM: arm64: Rewrite sysreg alternatives to static keys
  KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
    functions
  KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
  KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
  KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
  KVM: arm64: Change 32-bit handling of VM system registers
  KVM: arm64: Rewrite system register accessors to read/write functions
  KVM: arm64: Introduce framework for accessing deferred sysregs
  KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
  KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
  KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
    VHE
  KVM: arm64: Prepare to handle deferred save/restore of 32-bit
    registers
  KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
  KVM: arm64: Move common VHE/non-VHE trap config in separate functions
  KVM: arm64: Configure FPSIMD traps on vcpu load/put
  KVM: arm64: Configure c15, PMU, and debug register traps on cpu
    load/put for VHE
  KVM: arm64: Separate activate_traps and deactive_traps for VHE and
    non-VHE
  KVM: arm/arm64: Get rid of vgic_elrsr
  KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
  KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
  KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
    VHE
  KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
  KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs

Shih-Wei Li (1):
  KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag

 arch/arm/include/asm/kvm_asm.h                    |   5 +-
 arch/arm/include/asm/kvm_emulate.h                |  21 +-
 arch/arm/include/asm/kvm_host.h                   |   6 +-
 arch/arm/include/asm/kvm_hyp.h                    |   4 +
 arch/arm/kvm/emulate.c                            |   4 +-
 arch/arm/kvm/hyp/Makefile                         |   1 -
 arch/arm/kvm/hyp/switch.c                         |  16 +-
 arch/arm64/include/asm/kvm_arm.h                  |   4 +-
 arch/arm64/include/asm/kvm_asm.h                  |  18 +-
 arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
 arch/arm64/include/asm/kvm_host.h                 |  49 ++-
 arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
 arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
 arch/arm64/kernel/asm-offsets.c                   |   2 +
 arch/arm64/kvm/debug.c                            |  28 +-
 arch/arm64/kvm/guest.c                            |   3 -
 arch/arm64/kvm/hyp/Makefile                       |   2 +-
 arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
 arch/arm64/kvm/hyp/entry.S                        |   9 +-
 arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
 arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
 arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
 {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
 arch/arm64/kvm/inject_fault.c                     |  24 +-
 arch/arm64/kvm/regmap.c                           |  65 +++-
 arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
 arch/arm64/kvm/sys_regs.h                         |   4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
 include/kvm/arm_vgic.h                            |   2 -
 virt/kvm/arm/aarch32.c                            |   2 +-
 virt/kvm/arm/arch_timer.c                         |   7 -
 virt/kvm/arm/arm.c                                |  50 ++-
 virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
 virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
 virt/kvm/arm/mmu.c                                |   6 +-
 virt/kvm/arm/pmu.c                                |  37 +-
 virt/kvm/arm/vgic/vgic-init.c                     |  11 -
 virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
 virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
 virt/kvm/arm/vgic/vgic.c                          |  21 ++
 virt/kvm/arm/vgic/vgic.h                          |   3 +
 41 files changed, 1229 insertions(+), 701 deletions(-)
 rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

Calling vcpu_load() registers preempt notifiers for this vcpu and calls
kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
lifting on arm/arm64 and will try to do things such as enabling the
virtual timer and setting us up to handle interrupts from the timer
hardware.

Loading state onto hardware registers and enabling hardware to signal
interrupts can be problematic when we're not actually about to run the
VCPU, because it makes it difficult to establish the right context when
handling interrupts from the timer, and it makes the register access
code difficult to reason about.

Luckily, now when we call vcpu_load in each ioctl implementation, we can
simply remove the call from the non-KVM_RUN vcpu ioctls, and our
kvm_arch_vcpu_load() is only used for loading vcpu content to the
physical CPU when we're actually going to run the vcpu.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/guest.c | 3 ---
 virt/kvm/arm/arm.c     | 9 ---------
 2 files changed, 12 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index d7e3299a7734..959e50d2588c 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -363,8 +363,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 {
 	int ret = 0;
 
-	vcpu_load(vcpu);
-
 	trace_kvm_set_guest_debug(vcpu, dbg->control);
 
 	if (dbg->control & ~KVM_GUESTDBG_VALID_MASK) {
@@ -386,7 +384,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	}
 
 out:
-	vcpu_put(vcpu);
 	return ret;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 7a6ce4830cc5..5e3c149a6e28 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -383,14 +383,11 @@ static void vcpu_power_off(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	vcpu_load(vcpu);
-
 	if (vcpu->arch.power_off)
 		mp_state->mp_state = KVM_MP_STATE_STOPPED;
 	else
 		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
 
-	vcpu_put(vcpu);
 	return 0;
 }
 
@@ -399,8 +396,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
 	int ret = 0;
 
-	vcpu_load(vcpu);
-
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
 		vcpu->arch.power_off = false;
@@ -412,7 +407,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		ret = -EINVAL;
 	}
 
-	vcpu_put(vcpu);
 	return ret;
 }
 
@@ -1028,8 +1022,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	struct kvm_device_attr attr;
 	long r;
 
-	vcpu_load(vcpu);
-
 	switch (ioctl) {
 	case KVM_ARM_VCPU_INIT: {
 		struct kvm_vcpu_init init;
@@ -1106,7 +1098,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = -EINVAL;
 	}
 
-	vcpu_put(vcpu);
 	return r;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Calling vcpu_load() registers preempt notifiers for this vcpu and calls
kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
lifting on arm/arm64 and will try to do things such as enabling the
virtual timer and setting us up to handle interrupts from the timer
hardware.

Loading state onto hardware registers and enabling hardware to signal
interrupts can be problematic when we're not actually about to run the
VCPU, because it makes it difficult to establish the right context when
handling interrupts from the timer, and it makes the register access
code difficult to reason about.

Luckily, now when we call vcpu_load in each ioctl implementation, we can
simply remove the call from the non-KVM_RUN vcpu ioctls, and our
kvm_arch_vcpu_load() is only used for loading vcpu content to the
physical CPU when we're actually going to run the vcpu.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/guest.c | 3 ---
 virt/kvm/arm/arm.c     | 9 ---------
 2 files changed, 12 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index d7e3299a7734..959e50d2588c 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -363,8 +363,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 {
 	int ret = 0;
 
-	vcpu_load(vcpu);
-
 	trace_kvm_set_guest_debug(vcpu, dbg->control);
 
 	if (dbg->control & ~KVM_GUESTDBG_VALID_MASK) {
@@ -386,7 +384,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	}
 
 out:
-	vcpu_put(vcpu);
 	return ret;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 7a6ce4830cc5..5e3c149a6e28 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -383,14 +383,11 @@ static void vcpu_power_off(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	vcpu_load(vcpu);
-
 	if (vcpu->arch.power_off)
 		mp_state->mp_state = KVM_MP_STATE_STOPPED;
 	else
 		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
 
-	vcpu_put(vcpu);
 	return 0;
 }
 
@@ -399,8 +396,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
 	int ret = 0;
 
-	vcpu_load(vcpu);
-
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
 		vcpu->arch.power_off = false;
@@ -412,7 +407,6 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		ret = -EINVAL;
 	}
 
-	vcpu_put(vcpu);
 	return ret;
 }
 
@@ -1028,8 +1022,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	struct kvm_device_attr attr;
 	long r;
 
-	vcpu_load(vcpu);
-
 	switch (ioctl) {
 	case KVM_ARM_VCPU_INIT: {
 		struct kvm_vcpu_init init;
@@ -1106,7 +1098,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = -EINVAL;
 	}
 
-	vcpu_put(vcpu);
 	return r;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
there is also no need to do vcpu_load() for things such as handling the
immediate_exit flag.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/arch_timer.c     |  7 -------
 virt/kvm/arm/arm.c            | 22 ++++++++--------------
 virt/kvm/arm/vgic/vgic-init.c | 11 -----------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index cfcd0323deab..c09c701fd68e 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -834,14 +834,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 		return ret;
 
 no_vgic:
-	preempt_disable();
 	timer->enabled = 1;
-	if (!irqchip_in_kernel(vcpu->kvm))
-		kvm_timer_vcpu_load_user(vcpu);
-	else
-		kvm_timer_vcpu_load_vgic(vcpu);
-	preempt_enable();
-
 	return 0;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5e3c149a6e28..360df72692ee 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -631,27 +631,22 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
-	vcpu_load(vcpu);
-
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
-		goto out;
+		return ret;
 
 	if (run->exit_reason == KVM_EXIT_MMIO) {
 		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
 		if (ret)
-			goto out;
-		if (kvm_arm_handle_step_debug(vcpu, vcpu->run)) {
-			ret = 0;
-			goto out;
-		}
-
+			return ret;
+		if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
+			return 0;
 	}
 
-	if (run->immediate_exit) {
-		ret = -EINTR;
-		goto out;
-	}
+	if (run->immediate_exit)
+		return -EINTR;
+
+	vcpu_load(vcpu);
 
 	kvm_sigset_activate(vcpu);
 
@@ -803,7 +798,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	kvm_sigset_deactivate(vcpu);
 
-out:
 	vcpu_put(vcpu);
 	return ret;
 }
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 62310122ee78..a0688ef52ad7 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -300,17 +300,6 @@ int vgic_init(struct kvm *kvm)
 
 	dist->initialized = true;
 
-	/*
-	 * If we're initializing GICv2 on-demand when first running the VCPU
-	 * then we need to load the VGIC state onto the CPU.  We can detect
-	 * this easily by checking if we are in between vcpu_load and vcpu_put
-	 * when we just initialized the VGIC.
-	 */
-	preempt_disable();
-	vcpu = kvm_arm_get_running_vcpu();
-	if (vcpu)
-		kvm_vgic_load(vcpu);
-	preempt_enable();
 out:
 	return ret;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
there is also no need to do vcpu_load() for things such as handling the
immediate_exit flag.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/arch_timer.c     |  7 -------
 virt/kvm/arm/arm.c            | 22 ++++++++--------------
 virt/kvm/arm/vgic/vgic-init.c | 11 -----------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index cfcd0323deab..c09c701fd68e 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -834,14 +834,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 		return ret;
 
 no_vgic:
-	preempt_disable();
 	timer->enabled = 1;
-	if (!irqchip_in_kernel(vcpu->kvm))
-		kvm_timer_vcpu_load_user(vcpu);
-	else
-		kvm_timer_vcpu_load_vgic(vcpu);
-	preempt_enable();
-
 	return 0;
 }
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5e3c149a6e28..360df72692ee 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -631,27 +631,22 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
-	vcpu_load(vcpu);
-
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
-		goto out;
+		return ret;
 
 	if (run->exit_reason == KVM_EXIT_MMIO) {
 		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
 		if (ret)
-			goto out;
-		if (kvm_arm_handle_step_debug(vcpu, vcpu->run)) {
-			ret = 0;
-			goto out;
-		}
-
+			return ret;
+		if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
+			return 0;
 	}
 
-	if (run->immediate_exit) {
-		ret = -EINTR;
-		goto out;
-	}
+	if (run->immediate_exit)
+		return -EINTR;
+
+	vcpu_load(vcpu);
 
 	kvm_sigset_activate(vcpu);
 
@@ -803,7 +798,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	kvm_sigset_deactivate(vcpu);
 
-out:
 	vcpu_put(vcpu);
 	return ret;
 }
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 62310122ee78..a0688ef52ad7 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -300,17 +300,6 @@ int vgic_init(struct kvm *kvm)
 
 	dist->initialized = true;
 
-	/*
-	 * If we're initializing GICv2 on-demand when first running the VCPU
-	 * then we need to load the VGIC state onto the CPU.  We can detect
-	 * this easily by checking if we are in between vcpu_load and vcpu_put
-	 * when we just initialized the VGIC.
-	 */
-	preempt_disable();
-	vcpu = kvm_arm_get_running_vcpu();
-	if (vcpu)
-		kvm_vgic_load(vcpu);
-	preempt_enable();
 out:
 	return ret;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall,
	Ard Biesheuvel

We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch.  We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.

This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address.  We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().

This change also requires us to have a scratch register, so we take the
chance to rearrange some of the el1_sync code to only look at the
vttbr_el2 to determine if this is a trap from the guest or an HVC from
the host.  We do add an extra check to call the panic code if the kernel
is configured with debugging enabled and we saw a trap from the host
which wasn't an HVC, indicating that we left some EL2 trap configured by
mistake.

The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 14 +++++++++++++
 arch/arm64/include/asm/kvm_host.h | 15 ++++++++++++++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S        |  6 +-----
 arch/arm64/kvm/hyp/hyp-entry.S    | 41 ++++++++++++++++++---------------------
 arch/arm64/kvm/hyp/switch.c       |  5 +----
 arch/arm64/kvm/hyp/sysreg-sr.c    |  5 +++++
 7 files changed, 56 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ab4d0a926043..6c7599b5cb40 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -33,6 +33,7 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+/* Translate a kernel address of @sym into its equivalent linear mapping */
 #define kvm_ksym_ref(sym)						\
 	({								\
 		void *val = &sym;					\
@@ -68,6 +69,19 @@ extern u32 __kvm_get_mdcr_el2(void);
 
 extern u32 __init_stage2_translation(void);
 
+#else /* __ASSEMBLY__ */
+
+.macro get_host_ctxt reg, tmp
+	adr_l	\reg, kvm_host_cpu_state
+	mrs	\tmp, tpidr_el2
+	add	\reg, \reg, \tmp
+.endm
+
+.macro get_vcpu vcpu, ctxt
+	ldr	\vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
+	kern_hyp_va	\vcpu
+.endm
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 048f5db120f3..6ce0b428a4db 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -350,10 +350,15 @@ int kvm_perf_teardown(void);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+extern void __kvm_set_tpidr_el2(u64 tpidr_el2);
+DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
+
 static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
+	u64 tpidr_el2;
+
 	/*
 	 * Call initialization code, and switch to the full blown HYP code.
 	 * If the cpucaps haven't been finalized yet, something has gone very
@@ -362,6 +367,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	 */
 	BUG_ON(!static_branch_likely(&arm64_const_caps_ready));
 	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
+
+	/*
+	 * Calculate the raw per-cpu offset without a translation from the
+	 * kernel's mapping to the linear mapping, and store it in tpidr_el2
+	 * so that we can use adr_l to access per-cpu variables in EL2.
+	 */
+	tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state)
+		- (u64)kvm_ksym_ref(kvm_host_cpu_state);
+
+	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
 }
 
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 71bf088f1e4b..612021dce84f 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -135,6 +135,7 @@ int main(void)
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
+  DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 9a8ab5dddd9e..a360ac6e89e9 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -62,9 +62,6 @@ ENTRY(__guest_enter)
 	// Store the host regs
 	save_callee_saved_regs x1
 
-	// Store host_ctxt and vcpu for use at exit time
-	stp	x1, x0, [sp, #-16]!
-
 	add	x18, x0, #VCPU_CONTEXT
 
 	// Restore guest regs x0-x17
@@ -118,8 +115,7 @@ ENTRY(__guest_exit)
 	// Store the guest regs x19-x29, lr
 	save_callee_saved_regs x1
 
-	// Restore the host_ctxt from the stack
-	ldr	x2, [sp], #16
+	get_host_ctxt	x2, x3
 
 	// Now restore the host regs
 	restore_callee_saved_regs x2
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index e4f37b9dd47c..71b4cc92895e 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -56,18 +56,15 @@ ENDPROC(__vhe_hyp_call)
 el1_sync:				// Guest trapped into EL2
 	stp	x0, x1, [sp, #-16]!
 
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x1, esr_el2
-alternative_else
-	mrs	x1, esr_el1
-alternative_endif
-	lsr	x0, x1, #ESR_ELx_EC_SHIFT
+	mrs	x1, vttbr_el2		// If vttbr is valid, this is a trap
+	cbnz	x1, el1_trap		// from the guest
 
-	cmp	x0, #ESR_ELx_EC_HVC64
-	b.ne	el1_trap
-
-	mrs	x1, vttbr_el2		// If vttbr is valid, the 64bit guest
-	cbnz	x1, el1_trap		// called HVC
+#ifdef CONFIG_DEBUG
+	mrs	x0, esr_el2
+	lsr	x0, x0, #ESR_ELx_EC_SHIFT
+	cmp     x0, #ESR_ELx_EC_HVC64
+	b.ne    __hyp_panic
+#endif
 
 	/* Here, we're pretty sure the host called HVC. */
 	ldp	x0, x1, [sp], #16
@@ -101,10 +98,15 @@ alternative_endif
 	eret
 
 el1_trap:
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
+
+	mrs		x0, esr_el2
+	lsr		x0, x0, #ESR_ELx_EC_SHIFT
 	/*
 	 * x0: ESR_EC
+	 * x1: vcpu pointer
 	 */
-	ldr	x1, [sp, #16 + 8]	// vcpu stored by __guest_enter
 
 	/*
 	 * We trap the first access to the FP/SIMD to save the host context
@@ -122,13 +124,15 @@ alternative_else_nop_endif
 
 el1_irq:
 	stp     x0, x1, [sp, #-16]!
-	ldr	x1, [sp, #16 + 8]
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
 	mov	x0, #ARM_EXCEPTION_IRQ
 	b	__guest_exit
 
 el1_error:
 	stp     x0, x1, [sp, #-16]!
-	ldr	x1, [sp, #16 + 8]
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
 	mov	x0, #ARM_EXCEPTION_EL1_SERROR
 	b	__guest_exit
 
@@ -164,14 +168,7 @@ ENTRY(__hyp_do_panic)
 ENDPROC(__hyp_do_panic)
 
 ENTRY(__hyp_panic)
-	/*
-	 * '=kvm_host_cpu_state' is a host VA from the constant pool, it may
-	 * not be accessible by this address from EL2, hyp_panic() converts
-	 * it with kern_hyp_va() before use.
-	 */
-	ldr	x0, =kvm_host_cpu_state
-	mrs	x1, tpidr_el2
-	add	x0, x0, x1
+	get_host_ctxt x0, x1
 	b	hyp_panic
 ENDPROC(__hyp_panic)
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f7307b6b42f0..6fcb37e220b5 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -449,7 +449,7 @@ static hyp_alternate_select(__hyp_call_panic,
 			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
-void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt)
+void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu *vcpu = NULL;
 
@@ -458,9 +458,6 @@ void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt)
 	u64 par = read_sysreg(par_el1);
 
 	if (read_sysreg(vttbr_el2)) {
-		struct kvm_cpu_context *host_ctxt;
-
-		host_ctxt = kern_hyp_va(__host_ctxt);
 		vcpu = host_ctxt->__hyp_running_vcpu;
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index c54cc2afb92b..e19d89cabf2a 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -183,3 +183,8 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
+
+void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
+{
+	asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
+}
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch.  We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.

This does require us to calculate the percpu offset without including
the offset from the kernel mapping of the percpu array to the linear
mapping of the array (which is what we store in tpidr_el1), because a
PC-relative generated address in EL2 is already giving us the hyp alias
of the linear mapping of a kernel address.  We do this in
__cpu_init_hyp_mode() by using kvm_ksym_ref().

This change also requires us to have a scratch register, so we take the
chance to rearrange some of the el1_sync code to only look at the
vttbr_el2 to determine if this is a trap from the guest or an HVC from
the host.  We do add an extra check to call the panic code if the kernel
is configured with debugging enabled and we saw a trap from the host
which wasn't an HVC, indicating that we left some EL2 trap configured by
mistake.

The code that accesses ESR_EL2 was previously using an alternative to
use the _EL1 accessor on VHE systems, but this was actually unnecessary
as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2
accessor does the same thing on both systems.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 14 +++++++++++++
 arch/arm64/include/asm/kvm_host.h | 15 ++++++++++++++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S        |  6 +-----
 arch/arm64/kvm/hyp/hyp-entry.S    | 41 ++++++++++++++++++---------------------
 arch/arm64/kvm/hyp/switch.c       |  5 +----
 arch/arm64/kvm/hyp/sysreg-sr.c    |  5 +++++
 7 files changed, 56 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ab4d0a926043..6c7599b5cb40 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -33,6 +33,7 @@
 #define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
 #define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
 
+/* Translate a kernel address of @sym into its equivalent linear mapping */
 #define kvm_ksym_ref(sym)						\
 	({								\
 		void *val = &sym;					\
@@ -68,6 +69,19 @@ extern u32 __kvm_get_mdcr_el2(void);
 
 extern u32 __init_stage2_translation(void);
 
+#else /* __ASSEMBLY__ */
+
+.macro get_host_ctxt reg, tmp
+	adr_l	\reg, kvm_host_cpu_state
+	mrs	\tmp, tpidr_el2
+	add	\reg, \reg, \tmp
+.endm
+
+.macro get_vcpu vcpu, ctxt
+	ldr	\vcpu, [\ctxt, #HOST_CONTEXT_VCPU]
+	kern_hyp_va	\vcpu
+.endm
+
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 048f5db120f3..6ce0b428a4db 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -350,10 +350,15 @@ int kvm_perf_teardown(void);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+extern void __kvm_set_tpidr_el2(u64 tpidr_el2);
+DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state);
+
 static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 				       unsigned long hyp_stack_ptr,
 				       unsigned long vector_ptr)
 {
+	u64 tpidr_el2;
+
 	/*
 	 * Call initialization code, and switch to the full blown HYP code.
 	 * If the cpucaps haven't been finalized yet, something has gone very
@@ -362,6 +367,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	 */
 	BUG_ON(!static_branch_likely(&arm64_const_caps_ready));
 	__kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr);
+
+	/*
+	 * Calculate the raw per-cpu offset without a translation from the
+	 * kernel's mapping to the linear mapping, and store it in tpidr_el2
+	 * so that we can use adr_l to access per-cpu variables in EL2.
+	 */
+	tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state)
+		- (u64)kvm_ksym_ref(kvm_host_cpu_state);
+
+	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
 }
 
 static inline void kvm_arch_hardware_unsetup(void) {}
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 71bf088f1e4b..612021dce84f 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -135,6 +135,7 @@ int main(void)
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
+  DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 9a8ab5dddd9e..a360ac6e89e9 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -62,9 +62,6 @@ ENTRY(__guest_enter)
 	// Store the host regs
 	save_callee_saved_regs x1
 
-	// Store host_ctxt and vcpu for use@exit time
-	stp	x1, x0, [sp, #-16]!
-
 	add	x18, x0, #VCPU_CONTEXT
 
 	// Restore guest regs x0-x17
@@ -118,8 +115,7 @@ ENTRY(__guest_exit)
 	// Store the guest regs x19-x29, lr
 	save_callee_saved_regs x1
 
-	// Restore the host_ctxt from the stack
-	ldr	x2, [sp], #16
+	get_host_ctxt	x2, x3
 
 	// Now restore the host regs
 	restore_callee_saved_regs x2
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index e4f37b9dd47c..71b4cc92895e 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -56,18 +56,15 @@ ENDPROC(__vhe_hyp_call)
 el1_sync:				// Guest trapped into EL2
 	stp	x0, x1, [sp, #-16]!
 
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x1, esr_el2
-alternative_else
-	mrs	x1, esr_el1
-alternative_endif
-	lsr	x0, x1, #ESR_ELx_EC_SHIFT
+	mrs	x1, vttbr_el2		// If vttbr is valid, this is a trap
+	cbnz	x1, el1_trap		// from the guest
 
-	cmp	x0, #ESR_ELx_EC_HVC64
-	b.ne	el1_trap
-
-	mrs	x1, vttbr_el2		// If vttbr is valid, the 64bit guest
-	cbnz	x1, el1_trap		// called HVC
+#ifdef CONFIG_DEBUG
+	mrs	x0, esr_el2
+	lsr	x0, x0, #ESR_ELx_EC_SHIFT
+	cmp     x0, #ESR_ELx_EC_HVC64
+	b.ne    __hyp_panic
+#endif
 
 	/* Here, we're pretty sure the host called HVC. */
 	ldp	x0, x1, [sp], #16
@@ -101,10 +98,15 @@ alternative_endif
 	eret
 
 el1_trap:
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
+
+	mrs		x0, esr_el2
+	lsr		x0, x0, #ESR_ELx_EC_SHIFT
 	/*
 	 * x0: ESR_EC
+	 * x1: vcpu pointer
 	 */
-	ldr	x1, [sp, #16 + 8]	// vcpu stored by __guest_enter
 
 	/*
 	 * We trap the first access to the FP/SIMD to save the host context
@@ -122,13 +124,15 @@ alternative_else_nop_endif
 
 el1_irq:
 	stp     x0, x1, [sp, #-16]!
-	ldr	x1, [sp, #16 + 8]
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
 	mov	x0, #ARM_EXCEPTION_IRQ
 	b	__guest_exit
 
 el1_error:
 	stp     x0, x1, [sp, #-16]!
-	ldr	x1, [sp, #16 + 8]
+	get_host_ctxt	x0, x1
+	get_vcpu	x1, x0
 	mov	x0, #ARM_EXCEPTION_EL1_SERROR
 	b	__guest_exit
 
@@ -164,14 +168,7 @@ ENTRY(__hyp_do_panic)
 ENDPROC(__hyp_do_panic)
 
 ENTRY(__hyp_panic)
-	/*
-	 * '=kvm_host_cpu_state' is a host VA from the constant pool, it may
-	 * not be accessible by this address from EL2, hyp_panic() converts
-	 * it with kern_hyp_va() before use.
-	 */
-	ldr	x0, =kvm_host_cpu_state
-	mrs	x1, tpidr_el2
-	add	x0, x0, x1
+	get_host_ctxt x0, x1
 	b	hyp_panic
 ENDPROC(__hyp_panic)
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f7307b6b42f0..6fcb37e220b5 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -449,7 +449,7 @@ static hyp_alternate_select(__hyp_call_panic,
 			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
-void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt)
+void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu *vcpu = NULL;
 
@@ -458,9 +458,6 @@ void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt)
 	u64 par = read_sysreg(par_el1);
 
 	if (read_sysreg(vttbr_el2)) {
-		struct kvm_cpu_context *host_ctxt;
-
-		host_ctxt = kern_hyp_va(__host_ctxt);
 		vcpu = host_ctxt->__hyp_running_vcpu;
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index c54cc2afb92b..e19d89cabf2a 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -183,3 +183,8 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
+
+void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
+{
+	asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
+}
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

VHE actually doesn't rely on clearing the VTTBR when returning to the
host kernel, and that is the current key mechanism of hyp_panic to
figure out how to attempt to return to a state good enough to print a
panic statement.

Therefore, we split the hyp_panic function into two functions, a VHE and
a non-VHE, keeping the non-VHE version intact, but changing the VHE
behavior.

The vttbr_el2 check on VHE doesn't really make that much sense, because
the only situation where we can get here on VHE is when the hypervisor
assembly code actually called into hyp_panic, which only happens when
VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
always safely disable the traps and restore the host registers at this
point, so we simply do that unconditionally and call into the panic
function directly.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6fcb37e220b5..71700ecee308 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
-					     struct kvm_vcpu *vcpu)
+					     struct kvm_cpu_context *__host_ctxt)
 {
+	struct kvm_vcpu *vcpu;
 	unsigned long str_va;
 
+	vcpu = __host_ctxt->__hyp_running_vcpu;
+
+	if (read_sysreg(vttbr_el2)) {
+		__timer_disable_traps(vcpu);
+		__deactivate_traps(vcpu);
+		__deactivate_vm(vcpu);
+		__sysreg_restore_host_state(__host_ctxt);
+	}
+
 	/*
 	 * Force the panic string to be loaded from the literal pool,
 	 * making sure it is a kernel address and not a PC-relative
@@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		       read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
-					    struct kvm_vcpu *vcpu)
+static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
+				 struct kvm_cpu_context *host_ctxt)
 {
+	struct kvm_vcpu *vcpu;
+	vcpu = host_ctxt->__hyp_running_vcpu;
+
+	__deactivate_traps(vcpu);
+	__sysreg_restore_host_state(host_ctxt);
+
 	panic(__hyp_panic_string,
 	      spsr,  elr,
 	      read_sysreg_el2(esr),   read_sysreg_el2(far),
 	      read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static hyp_alternate_select(__hyp_call_panic,
-			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
-	struct kvm_vcpu *vcpu = NULL;
-
 	u64 spsr = read_sysreg_el2(spsr);
 	u64 elr = read_sysreg_el2(elr);
 	u64 par = read_sysreg(par_el1);
 
-	if (read_sysreg(vttbr_el2)) {
-		vcpu = host_ctxt->__hyp_running_vcpu;
-		__timer_disable_traps(vcpu);
-		__deactivate_traps(vcpu);
-		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state(host_ctxt);
-	}
-
-	/* Call panic for real */
-	__hyp_call_panic()(spsr, elr, par, vcpu);
+	if (!has_vhe())
+		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
+	else
+		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
 
 	unreachable();
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

VHE actually doesn't rely on clearing the VTTBR when returning to the
host kernel, and that is the current key mechanism of hyp_panic to
figure out how to attempt to return to a state good enough to print a
panic statement.

Therefore, we split the hyp_panic function into two functions, a VHE and
a non-VHE, keeping the non-VHE version intact, but changing the VHE
behavior.

The vttbr_el2 check on VHE doesn't really make that much sense, because
the only situation where we can get here on VHE is when the hypervisor
assembly code actually called into hyp_panic, which only happens when
VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
always safely disable the traps and restore the host registers at this
point, so we simply do that unconditionally and call into the panic
function directly.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6fcb37e220b5..71700ecee308 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
 
 static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
-					     struct kvm_vcpu *vcpu)
+					     struct kvm_cpu_context *__host_ctxt)
 {
+	struct kvm_vcpu *vcpu;
 	unsigned long str_va;
 
+	vcpu = __host_ctxt->__hyp_running_vcpu;
+
+	if (read_sysreg(vttbr_el2)) {
+		__timer_disable_traps(vcpu);
+		__deactivate_traps(vcpu);
+		__deactivate_vm(vcpu);
+		__sysreg_restore_host_state(__host_ctxt);
+	}
+
 	/*
 	 * Force the panic string to be loaded from the literal pool,
 	 * making sure it is a kernel address and not a PC-relative
@@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		       read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
-					    struct kvm_vcpu *vcpu)
+static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
+				 struct kvm_cpu_context *host_ctxt)
 {
+	struct kvm_vcpu *vcpu;
+	vcpu = host_ctxt->__hyp_running_vcpu;
+
+	__deactivate_traps(vcpu);
+	__sysreg_restore_host_state(host_ctxt);
+
 	panic(__hyp_panic_string,
 	      spsr,  elr,
 	      read_sysreg_el2(esr),   read_sysreg_el2(far),
 	      read_sysreg(hpfar_el2), par, vcpu);
 }
 
-static hyp_alternate_select(__hyp_call_panic,
-			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
-	struct kvm_vcpu *vcpu = NULL;
-
 	u64 spsr = read_sysreg_el2(spsr);
 	u64 elr = read_sysreg_el2(elr);
 	u64 par = read_sysreg(par_el1);
 
-	if (read_sysreg(vttbr_el2)) {
-		vcpu = host_ctxt->__hyp_running_vcpu;
-		__timer_disable_traps(vcpu);
-		__deactivate_traps(vcpu);
-		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state(host_ctxt);
-	}
-
-	/* Call panic for real */
-	__hyp_call_panic()(spsr, elr, par, vcpu);
+	if (!has_vhe())
+		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
+	else
+		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
 
 	unreachable();
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

From: Shih-Wei Li <shihwei@cs.columbia.edu>

We always set the IMO and FMO bits in the HCR_EL2 when running the
guest, regardless if we use the vgic or not.  By moving these flags to
HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
HCR_EL2 in the world switch code, and we can also soon get rid of the
other one.

This is safe, because even though the IMO and FMO bits control both
taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
executed at EL1, as long as we ensure that these bits are clear when
running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_arm.h | 4 ++--
 arch/arm64/kvm/hyp/switch.c      | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 715d395ef45b..656deeb17bf2 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -79,9 +79,9 @@
  */
 #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
 			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
-			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
+			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | \
+			 HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 71700ecee308..f6189d08753e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -167,8 +167,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 		__vgic_v3_save_state(vcpu);
 	else
 		__vgic_v2_save_state(vcpu);
-
-	write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
@@ -176,7 +174,6 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 	u64 val;
 
 	val = read_sysreg(hcr_el2);
-	val |= 	HCR_INT_OVERRIDE;
 	val |= vcpu->arch.irq_lines;
 	write_sysreg(val, hcr_el2);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

From: Shih-Wei Li <shihwei@cs.columbia.edu>

We always set the IMO and FMO bits in the HCR_EL2 when running the
guest, regardless if we use the vgic or not.  By moving these flags to
HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
HCR_EL2 in the world switch code, and we can also soon get rid of the
other one.

This is safe, because even though the IMO and FMO bits control both
taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
executed at EL1, as long as we ensure that these bits are clear when
running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_arm.h | 4 ++--
 arch/arm64/kvm/hyp/switch.c      | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 715d395ef45b..656deeb17bf2 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -79,9 +79,9 @@
  */
 #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
 			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
-			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
+			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | \
+			 HCR_FMO | HCR_IMO)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
-#define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 71700ecee308..f6189d08753e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -167,8 +167,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 		__vgic_v3_save_state(vcpu);
 	else
 		__vgic_v2_save_state(vcpu);
-
-	write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
@@ -176,7 +174,6 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 	u64 val;
 
 	val = read_sysreg(hcr_el2);
-	val |= 	HCR_INT_OVERRIDE;
 	val |= vcpu->arch.irq_lines;
 	write_sysreg(val, hcr_el2);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 06/41] KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify the bits in the vcpu->arch.hcr[_el2] directly when
needed.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_emulate.h   |  9 ++-------
 arch/arm/include/asm/kvm_host.h      |  3 ---
 arch/arm/kvm/emulate.c               |  2 +-
 arch/arm/kvm/hyp/switch.c            |  2 +-
 arch/arm64/include/asm/kvm_emulate.h |  9 ++-------
 arch/arm64/include/asm/kvm_host.h    |  3 ---
 arch/arm64/kvm/hyp/switch.c          |  6 ------
 arch/arm64/kvm/inject_fault.c        |  2 +-
 virt/kvm/arm/arm.c                   | 11 ++++++-----
 virt/kvm/arm/mmu.c                   |  6 +++---
 10 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 3d22eb87f919..d5e1b8bf6422 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -92,14 +92,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr = HCR_GUEST_MASK;
 }
 
-static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(const struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.hcr;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-	vcpu->arch.hcr = hcr;
+	return (unsigned long *)&vcpu->arch.hcr;
 }
 
 static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6394fb99da7f..7f96b3541939 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -155,9 +155,6 @@ struct kvm_vcpu_arch {
 	/* HYP trapping configuration */
 	u32 hcr;
 
-	/* Interrupt related fields */
-	u32 irq_lines;		/* IRQ and FIQ levels */
-
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index cdff963f133a..fa501bf437f3 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -174,5 +174,5 @@ unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VA);
+	*vcpu_hcr(vcpu) |= HCR_VA;
 }
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 330c9ce34ba5..c3b9799e2e13 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -43,7 +43,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu, u32 *fpexc_host)
 		isb();
 	}
 
-	write_sysreg(vcpu->arch.hcr | vcpu->arch.irq_lines, HCR);
+	write_sysreg(vcpu->arch.hcr, HCR);
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(HSTR_T(15), HSTR);
 	write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR);
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8ff5aef44656..b36aaa1fe332 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -62,14 +62,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 		vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
-static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.hcr_el2;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-	vcpu->arch.hcr_el2 = hcr;
+	return (unsigned long *)&vcpu->arch.hcr_el2;
 }
 
 static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6ce0b428a4db..59150c75f9a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -269,9 +269,6 @@ struct kvm_vcpu_arch {
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
-	/* Interrupt related fields */
-	u64 irq_lines;		/* IRQ and FIQ levels */
-
 	/* Cache some mmu pages needed inside spinlock regions */
 	struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f6189d08753e..11ec1c6f3b84 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -171,12 +171,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	u64 val;
-
-	val = read_sysreg(hcr_el2);
-	val |= vcpu->arch.irq_lines;
-	write_sysreg(val, hcr_el2);
-
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
 	else
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 8ecbcb40e317..2d38ede2eff0 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -173,5 +173,5 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	*vcpu_hcr(vcpu) |= HCR_VSE;
 }
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 360df72692ee..46c125f1da56 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -419,7 +419,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
+	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
+	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
 		&& !v->arch.power_off && !v->arch.pause);
 }
 
@@ -806,18 +807,18 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
 	bool set;
-	unsigned long *ptr;
+	unsigned long *hcr;
 
 	if (number == KVM_ARM_IRQ_CPU_IRQ)
 		bit_index = __ffs(HCR_VI);
 	else /* KVM_ARM_IRQ_CPU_FIQ */
 		bit_index = __ffs(HCR_VF);
 
-	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	hcr = vcpu_hcr(vcpu);
 	if (level)
-		set = test_and_set_bit(bit_index, ptr);
+		set = test_and_set_bit(bit_index, hcr);
 	else
-		set = test_and_clear_bit(bit_index, ptr);
+		set = test_and_clear_bit(bit_index, hcr);
 
 	/*
 	 * If we didn't change anything, no need to wake up or kick other CPUs
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..d93d56d4cc5b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1987,7 +1987,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
  */
 void kvm_set_way_flush(struct kvm_vcpu *vcpu)
 {
-	unsigned long hcr = vcpu_get_hcr(vcpu);
+	unsigned long hcr = *vcpu_hcr(vcpu);
 
 	/*
 	 * If this is the first time we do a S/W operation
@@ -2002,7 +2002,7 @@ void kvm_set_way_flush(struct kvm_vcpu *vcpu)
 		trace_kvm_set_way_flush(*vcpu_pc(vcpu),
 					vcpu_has_cache_enabled(vcpu));
 		stage2_flush_vm(vcpu->kvm);
-		vcpu_set_hcr(vcpu, hcr | HCR_TVM);
+		*vcpu_hcr(vcpu) = hcr | HCR_TVM;
 	}
 }
 
@@ -2020,7 +2020,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 
 	/* Caches are now on, stop trapping VM ops (until a S/W op) */
 	if (now_enabled)
-		vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
+		*vcpu_hcr(vcpu) &= ~HCR_TVM;
 
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 06/41] KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify the bits in the vcpu->arch.hcr[_el2] directly when
needed.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_emulate.h   |  9 ++-------
 arch/arm/include/asm/kvm_host.h      |  3 ---
 arch/arm/kvm/emulate.c               |  2 +-
 arch/arm/kvm/hyp/switch.c            |  2 +-
 arch/arm64/include/asm/kvm_emulate.h |  9 ++-------
 arch/arm64/include/asm/kvm_host.h    |  3 ---
 arch/arm64/kvm/hyp/switch.c          |  6 ------
 arch/arm64/kvm/inject_fault.c        |  2 +-
 virt/kvm/arm/arm.c                   | 11 ++++++-----
 virt/kvm/arm/mmu.c                   |  6 +++---
 10 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 3d22eb87f919..d5e1b8bf6422 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -92,14 +92,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr = HCR_GUEST_MASK;
 }
 
-static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(const struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.hcr;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-	vcpu->arch.hcr = hcr;
+	return (unsigned long *)&vcpu->arch.hcr;
 }
 
 static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6394fb99da7f..7f96b3541939 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -155,9 +155,6 @@ struct kvm_vcpu_arch {
 	/* HYP trapping configuration */
 	u32 hcr;
 
-	/* Interrupt related fields */
-	u32 irq_lines;		/* IRQ and FIQ levels */
-
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index cdff963f133a..fa501bf437f3 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -174,5 +174,5 @@ unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VA);
+	*vcpu_hcr(vcpu) |= HCR_VA;
 }
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 330c9ce34ba5..c3b9799e2e13 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -43,7 +43,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu, u32 *fpexc_host)
 		isb();
 	}
 
-	write_sysreg(vcpu->arch.hcr | vcpu->arch.irq_lines, HCR);
+	write_sysreg(vcpu->arch.hcr, HCR);
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(HSTR_T(15), HSTR);
 	write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR);
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8ff5aef44656..b36aaa1fe332 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -62,14 +62,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 		vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
-static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
+static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.hcr_el2;
-}
-
-static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
-{
-	vcpu->arch.hcr_el2 = hcr;
+	return (unsigned long *)&vcpu->arch.hcr_el2;
 }
 
 static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6ce0b428a4db..59150c75f9a5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -269,9 +269,6 @@ struct kvm_vcpu_arch {
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
-	/* Interrupt related fields */
-	u64 irq_lines;		/* IRQ and FIQ levels */
-
 	/* Cache some mmu pages needed inside spinlock regions */
 	struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index f6189d08753e..11ec1c6f3b84 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -171,12 +171,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	u64 val;
-
-	val = read_sysreg(hcr_el2);
-	val |= vcpu->arch.irq_lines;
-	write_sysreg(val, hcr_el2);
-
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
 	else
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 8ecbcb40e317..2d38ede2eff0 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -173,5 +173,5 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	*vcpu_hcr(vcpu) |= HCR_VSE;
 }
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 360df72692ee..46c125f1da56 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -419,7 +419,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
+	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
+	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
 		&& !v->arch.power_off && !v->arch.pause);
 }
 
@@ -806,18 +807,18 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 {
 	int bit_index;
 	bool set;
-	unsigned long *ptr;
+	unsigned long *hcr;
 
 	if (number == KVM_ARM_IRQ_CPU_IRQ)
 		bit_index = __ffs(HCR_VI);
 	else /* KVM_ARM_IRQ_CPU_FIQ */
 		bit_index = __ffs(HCR_VF);
 
-	ptr = (unsigned long *)&vcpu->arch.irq_lines;
+	hcr = vcpu_hcr(vcpu);
 	if (level)
-		set = test_and_set_bit(bit_index, ptr);
+		set = test_and_set_bit(bit_index, hcr);
 	else
-		set = test_and_clear_bit(bit_index, ptr);
+		set = test_and_clear_bit(bit_index, hcr);
 
 	/*
 	 * If we didn't change anything, no need to wake up or kick other CPUs
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..d93d56d4cc5b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1987,7 +1987,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
  */
 void kvm_set_way_flush(struct kvm_vcpu *vcpu)
 {
-	unsigned long hcr = vcpu_get_hcr(vcpu);
+	unsigned long hcr = *vcpu_hcr(vcpu);
 
 	/*
 	 * If this is the first time we do a S/W operation
@@ -2002,7 +2002,7 @@ void kvm_set_way_flush(struct kvm_vcpu *vcpu)
 		trace_kvm_set_way_flush(*vcpu_pc(vcpu),
 					vcpu_has_cache_enabled(vcpu));
 		stage2_flush_vm(vcpu->kvm);
-		vcpu_set_hcr(vcpu, hcr | HCR_TVM);
+		*vcpu_hcr(vcpu) = hcr | HCR_TVM;
 	}
 }
 
@@ -2020,7 +2020,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 
 	/* Caches are now on, stop trapping VM ops (until a S/W op) */
 	if (now_enabled)
-		vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
+		*vcpu_hcr(vcpu) &= ~HCR_TVM;
 
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 07/41] KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 +++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/hyp/sysreg-sr.c    | 30 ++++++++++++++++++++++++++++++
 virt/kvm/arm/arm.c                |  2 ++
 4 files changed, 38 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7f96b3541939..793b3bc5a56c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -300,4 +300,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
+static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 59150c75f9a5..0e9e7291a7e6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -412,4 +412,7 @@ static inline void kvm_fpsimd_flush_cpu_state(void)
 		sve_flush_cpu_state();
 }
 
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index e19d89cabf2a..cbbcd6f410a8 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -184,6 +184,36 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
+/**
+ * kvm_vcpu_load_sysregs - Load guest system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Load system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_load() function
+ * and loading system register state early avoids having to load them on
+ * every entry to the VM.
+ */
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
+/**
+ * kvm_vcpu_put_sysregs - Restore host system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Save guest system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_put() function
+ * and deferring saving system register state until we're no longer running the
+ * VCPU avoids having to save them on every exit from the VM.
+ */
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
 {
 	asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 46c125f1da56..5b1487bd91e8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -361,10 +361,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_arm_set_running_vcpu(vcpu);
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vcpu_load_sysregs(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_vcpu_put_sysregs(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 07/41] KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 +++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/hyp/sysreg-sr.c    | 30 ++++++++++++++++++++++++++++++
 virt/kvm/arm/arm.c                |  2 ++
 4 files changed, 38 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7f96b3541939..793b3bc5a56c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -300,4 +300,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
+static inline void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 59150c75f9a5..0e9e7291a7e6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -412,4 +412,7 @@ static inline void kvm_fpsimd_flush_cpu_state(void)
 		sve_flush_cpu_state();
 }
 
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index e19d89cabf2a..cbbcd6f410a8 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -184,6 +184,36 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
+/**
+ * kvm_vcpu_load_sysregs - Load guest system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Load system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_load() function
+ * and loading system register state early avoids having to load them on
+ * every entry to the VM.
+ */
+void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
+/**
+ * kvm_vcpu_put_sysregs - Restore host system registers to the physical CPU
+ *
+ * @vcpu: The VCPU pointer
+ *
+ * Save guest system registers that do not affect the host's execution, for
+ * example EL1 system registers on a VHE system where the host kernel
+ * runs at EL2.  This function is called from KVM's vcpu_put() function
+ * and deferring saving system register state until we're no longer running the
+ * VCPU avoids having to save them on every exit from the VM.
+ */
+void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
+{
+}
+
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
 {
 	asm("msr tpidr_el2, %0": : "r" (tpidr_el2));
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 46c125f1da56..5b1487bd91e8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -361,10 +361,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_arm_set_running_vcpu(vcpu);
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vcpu_load_sysregs(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_vcpu_put_sysregs(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

We have numerous checks around that checks if the HCR_EL2 has the RW bit
set to figure out if we're running an AArch64 or AArch32 VM.  In some
cases, directly checking the RW bit (given its unintuitive name), is a
bit confusing, and that's not going to improve as we move logic around
for the following patches that optimize KVM on AArch64 hosts with VHE.

Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
direct checks of HCR_EL2.RW with the helper.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
 arch/arm64/kvm/hyp/switch.c          | 8 ++------
 arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
 arch/arm64/kvm/inject_fault.c        | 6 +++---
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index b36aaa1fe332..e07bf463ac58 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
+{
+	return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
@@ -58,7 +63,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	 * For now this is conditional, since no AArch32 feature regs
 	 * are currently virtualised.
 	 */
-	if (vcpu->arch.hcr_el2 & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 11ec1c6f3b84..12dc647a6e5f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -70,8 +70,6 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	u64 val;
-
 	/*
 	 * We are about to set CPTR_EL2.TFP to trap all floating point
 	 * register accesses to EL2, however, the ARM ARM clearly states that
@@ -81,13 +79,11 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 	 * it will cause an exception.
 	 */
-	val = vcpu->arch.hcr_el2;
-
-	if (!(val & HCR_RW) && system_supports_fpsimd()) {
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
-	write_sysreg(val, hcr_el2);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index cbbcd6f410a8..883a6383cd36 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
 /* Yes, this does nothing, on purpose */
@@ -141,7 +142,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
 
-	if (read_sysreg(hcr_el2) & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
@@ -166,7 +167,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
 
-	if (read_sysreg(hcr_el2) & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 2d38ede2eff0..f4d35bb551e4 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -128,7 +128,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_dabt32(vcpu, addr);
 	else
 		inject_abt64(vcpu, false, addr);
@@ -144,7 +144,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
  */
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_pabt32(vcpu, addr);
 	else
 		inject_abt64(vcpu, true, addr);
@@ -158,7 +158,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
  */
 void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_undef32(vcpu);
 	else
 		inject_undef64(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We have numerous checks around that checks if the HCR_EL2 has the RW bit
set to figure out if we're running an AArch64 or AArch32 VM.  In some
cases, directly checking the RW bit (given its unintuitive name), is a
bit confusing, and that's not going to improve as we move logic around
for the following patches that optimize KVM on AArch64 hosts with VHE.

Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
direct checks of HCR_EL2.RW with the helper.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
 arch/arm64/kvm/hyp/switch.c          | 8 ++------
 arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
 arch/arm64/kvm/inject_fault.c        | 6 +++---
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index b36aaa1fe332..e07bf463ac58 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
+{
+	return !(vcpu->arch.hcr_el2 & HCR_RW);
+}
+
 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
@@ -58,7 +63,7 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	 * For now this is conditional, since no AArch32 feature regs
 	 * are currently virtualised.
 	 */
-	if (vcpu->arch.hcr_el2 & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		vcpu->arch.hcr_el2 |= HCR_TID3;
 }
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 11ec1c6f3b84..12dc647a6e5f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -70,8 +70,6 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	u64 val;
-
 	/*
 	 * We are about to set CPTR_EL2.TFP to trap all floating point
 	 * register accesses to EL2, however, the ARM ARM clearly states that
@@ -81,13 +79,11 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 	 * it will cause an exception.
 	 */
-	val = vcpu->arch.hcr_el2;
-
-	if (!(val & HCR_RW) && system_supports_fpsimd()) {
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
-	write_sysreg(val, hcr_el2);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index cbbcd6f410a8..883a6383cd36 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
 /* Yes, this does nothing, on purpose */
@@ -141,7 +142,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
 
-	if (read_sysreg(hcr_el2) & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
@@ -166,7 +167,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
 
-	if (read_sysreg(hcr_el2) & HCR_RW)
+	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 2d38ede2eff0..f4d35bb551e4 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -128,7 +128,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
  */
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_dabt32(vcpu, addr);
 	else
 		inject_abt64(vcpu, false, addr);
@@ -144,7 +144,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
  */
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_pabt32(vcpu, addr);
 	else
 		inject_abt64(vcpu, true, addr);
@@ -158,7 +158,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
  */
 void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
-	if (!(vcpu->arch.hcr_el2 & HCR_RW))
+	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_undef32(vcpu);
 	else
 		inject_undef64(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

Avoid saving the guest VFP registers and restoring the host VFP
registers on every exit from the VM.  Only when we're about to run
userspace or other threads in the kernel do we really have to switch the
state back to the host state.

We still initially configure the VFP registers to trap when entering the
VM, but the difference is that we now leave the guest state in the
hardware registers as long as we're running this VCPU, even if we
occasionally trap to the host, and we only restore the host state when
we return to user space or when scheduling another thread.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S        |  3 +++
 arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
 arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
 5 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0e9e7291a7e6..9e23bc968668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
 	/* Guest debug state */
 	u64 debug_flags;
 
+	/* 1 if the guest VFP state is loaded into the hardware */
+	u8 guest_vfp_loaded;
+
 	/*
 	 * We maintain more than a single set of debug registers to support
 	 * debugging the guest from the host and to maintain separate host and
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 612021dce84f..99467327c043 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -133,6 +133,7 @@ int main(void)
   DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index a360ac6e89e9..53652287a236 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -184,6 +184,9 @@ alternative_endif
 	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
 	bl	__fpsimd_restore_state
 
+	mov	x0, #1
+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
+
 	// Skip restoring fpexc32 for AArch64 guests
 	mrs	x1, hcr_el2
 	tbnz	x1, #HCR_RW_SHIFT, 1f
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 12dc647a6e5f..29e44a20f5e3 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,43 +24,32 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
-static bool __hyp_text __fpsimd_enabled_nvhe(void)
-{
-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
-
-static bool __hyp_text __fpsimd_enabled_vhe(void)
-{
-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
-}
-
-static hyp_alternate_select(__fpsimd_is_enabled,
-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-bool __hyp_text __fpsimd_enabled(void)
-{
-	return __fpsimd_is_enabled()();
-}
-
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (vcpu->arch.guest_vfp_loaded)
+		val |= CPACR_EL1_FPEN;
+	else
+		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (vcpu->arch.guest_vfp_loaded)
+		val &= ~CPTR_EL2_TFP;
+	else
+		val |= CPTR_EL2_TFP;
 	write_sysreg(val, cptr_el2);
 }
 
@@ -79,7 +68,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 	 * it will cause an exception.
 	 */
-	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
+	    !vcpu->arch.guest_vfp_loaded) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
@@ -96,7 +86,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmselr_el0);
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -284,7 +274,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -376,8 +365,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		/* 0 falls through to be handled out of EL2 */
 	}
 
-	fp_enabled = __fpsimd_enabled();
-
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -388,11 +375,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
-	}
-
 	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 883a6383cd36..848a46eb33bf 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -138,6 +138,11 @@ void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_common_state(ctxt);
 }
 
+static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
+}
+
 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
@@ -156,9 +161,6 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (__fpsimd_enabled())
-		sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
-
 	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
@@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
+
+	/* Restore host FP/SIMD state */
+	if (vcpu->arch.guest_vfp_loaded) {
+		if (vcpu_el1_is_32bit(vcpu)) {
+			kvm_call_hyp(__fpsimd32_save_state,
+				     kern_hyp_va(guest_ctxt));
+		}
+		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
+		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+		vcpu->arch.guest_vfp_loaded = 0;
+	}
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Avoid saving the guest VFP registers and restoring the host VFP
registers on every exit from the VM.  Only when we're about to run
userspace or other threads in the kernel do we really have to switch the
state back to the host state.

We still initially configure the VFP registers to trap when entering the
VM, but the difference is that we now leave the guest state in the
hardware registers as long as we're running this VCPU, even if we
occasionally trap to the host, and we only restore the host state when
we return to user space or when scheduling another thread.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp/entry.S        |  3 +++
 arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
 arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
 5 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0e9e7291a7e6..9e23bc968668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
 	/* Guest debug state */
 	u64 debug_flags;
 
+	/* 1 if the guest VFP state is loaded into the hardware */
+	u8 guest_vfp_loaded;
+
 	/*
 	 * We maintain more than a single set of debug registers to support
 	 * debugging the guest from the host and to maintain separate host and
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 612021dce84f..99467327c043 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -133,6 +133,7 @@ int main(void)
   DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
   DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
   DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index a360ac6e89e9..53652287a236 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -184,6 +184,9 @@ alternative_endif
 	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
 	bl	__fpsimd_restore_state
 
+	mov	x0, #1
+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
+
 	// Skip restoring fpexc32 for AArch64 guests
 	mrs	x1, hcr_el2
 	tbnz	x1, #HCR_RW_SHIFT, 1f
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 12dc647a6e5f..29e44a20f5e3 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,43 +24,32 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
-static bool __hyp_text __fpsimd_enabled_nvhe(void)
-{
-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
-
-static bool __hyp_text __fpsimd_enabled_vhe(void)
-{
-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
-}
-
-static hyp_alternate_select(__fpsimd_is_enabled,
-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-bool __hyp_text __fpsimd_enabled(void)
-{
-	return __fpsimd_is_enabled()();
-}
-
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (vcpu->arch.guest_vfp_loaded)
+		val |= CPACR_EL1_FPEN;
+	else
+		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (vcpu->arch.guest_vfp_loaded)
+		val &= ~CPTR_EL2_TFP;
+	else
+		val |= CPTR_EL2_TFP;
 	write_sysreg(val, cptr_el2);
 }
 
@@ -79,7 +68,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
 	 * it will cause an exception.
 	 */
-	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd()) {
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
+	    !vcpu->arch.guest_vfp_loaded) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
@@ -96,7 +86,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmselr_el0);
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
@@ -284,7 +274,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -376,8 +365,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		/* 0 falls through to be handled out of EL2 */
 	}
 
-	fp_enabled = __fpsimd_enabled();
-
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -388,11 +375,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
-	}
-
 	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 883a6383cd36..848a46eb33bf 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -138,6 +138,11 @@ void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_common_state(ctxt);
 }
 
+static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
+}
+
 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
 	u64 *spsr, *sysreg;
@@ -156,9 +161,6 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (__fpsimd_enabled())
-		sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
-
 	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
@@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
+
+	/* Restore host FP/SIMD state */
+	if (vcpu->arch.guest_vfp_loaded) {
+		if (vcpu_el1_is_32bit(vcpu)) {
+			kvm_call_hyp(__fpsimd32_save_state,
+				     kern_hyp_va(guest_ctxt));
+		}
+		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
+		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+		vcpu->arch.guest_vfp_loaded = 0;
+	}
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There is no need to figure out inside the world-switch if we should
save/restore the debug registers or not, we can might as well do that in
the higher level debug setup code, making it easier to optimize down the
line.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/debug.c        | 5 +++++
 arch/arm64/kvm/hyp/debug-sr.c | 6 ------
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index fa63b28c65e0..feedb877cff8 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -193,6 +193,11 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	if (trap_debug)
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
+	/* If KDE or MDE are set, perform a full save/restore cycle. */
+	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
+	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
+		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
 }
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 321c9c05dd9e..406829b6a43e 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -162,12 +162,6 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 
 void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 {
-	/* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
-	 * a full save/restore cycle. */
-	if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
-	    (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
-
 	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			   kern_hyp_va(vcpu->arch.host_cpu_context));
 	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to figure out inside the world-switch if we should
save/restore the debug registers or not, we can might as well do that in
the higher level debug setup code, making it easier to optimize down the
line.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/debug.c        | 5 +++++
 arch/arm64/kvm/hyp/debug-sr.c | 6 ------
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index fa63b28c65e0..feedb877cff8 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -193,6 +193,11 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	if (trap_debug)
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
+	/* If KDE or MDE are set, perform a full save/restore cycle. */
+	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
+	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
+		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
 }
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 321c9c05dd9e..406829b6a43e 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -162,12 +162,6 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 
 void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 {
-	/* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
-	 * a full save/restore cycle. */
-	if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
-	    (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
-
 	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			   kern_hyp_va(vcpu->arch.host_cpu_context));
 	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 11/41] KVM: arm64: Slightly improve debug save/restore functions
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

The debug save/restore functions can be improved by using the has_vhe()
static key instead of the instruction alternative.  Using the static key
uses the same paradigm as we're going to use elsewhere, it makes the
code more readable, and it generates slightly better code (no
stack setups and function calls unless necessary).

We also use a static key on the restore path, because it will be
marginally faster than loading a value from memory.

Finally, we don't have to conditionally clear the debug dirty flag if
it's set, we can just clear it.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/debug-sr.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 406829b6a43e..81b8ad44f9e0 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -65,11 +65,6 @@
 	default:	write_debug(ptr[0], reg, 0);			\
 	}
 
-static void __hyp_text __debug_save_spe_vhe(u64 *pmscr_el1)
-{
-	/* The vcpu can run. but it can't hide. */
-}
-
 static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 {
 	u64 reg;
@@ -99,11 +94,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static hyp_alternate_select(__debug_save_spe,
-			    __debug_save_spe_nvhe, __debug_save_spe_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
+static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 {
 	if (!pmscr_el1)
 		return;
@@ -164,17 +155,24 @@ void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 {
 	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			   kern_hyp_va(vcpu->arch.host_cpu_context));
-	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
+
+	/*
+	 * Non-VHE: Disable and flush SPE data generation
+	 * VHE: The vcpu can run, but it can't hide.
+	 */
+	if (!has_vhe())
+		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
 }
 
 void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
 {
-	__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+	if (!has_vhe())
+		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
+
 	__debug_restore_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			      kern_hyp_va(vcpu->arch.host_cpu_context));
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
-		vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 11/41] KVM: arm64: Slightly improve debug save/restore functions
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The debug save/restore functions can be improved by using the has_vhe()
static key instead of the instruction alternative.  Using the static key
uses the same paradigm as we're going to use elsewhere, it makes the
code more readable, and it generates slightly better code (no
stack setups and function calls unless necessary).

We also use a static key on the restore path, because it will be
marginally faster than loading a value from memory.

Finally, we don't have to conditionally clear the debug dirty flag if
it's set, we can just clear it.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/debug-sr.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 406829b6a43e..81b8ad44f9e0 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -65,11 +65,6 @@
 	default:	write_debug(ptr[0], reg, 0);			\
 	}
 
-static void __hyp_text __debug_save_spe_vhe(u64 *pmscr_el1)
-{
-	/* The vcpu can run. but it can't hide. */
-}
-
 static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 {
 	u64 reg;
@@ -99,11 +94,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static hyp_alternate_select(__debug_save_spe,
-			    __debug_save_spe_nvhe, __debug_save_spe_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __debug_restore_spe(u64 pmscr_el1)
+static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 {
 	if (!pmscr_el1)
 		return;
@@ -164,17 +155,24 @@ void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 {
 	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			   kern_hyp_va(vcpu->arch.host_cpu_context));
-	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
+
+	/*
+	 * Non-VHE: Disable and flush SPE data generation
+	 * VHE: The vcpu can run, but it can't hide.
+	 */
+	if (!has_vhe())
+		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
 }
 
 void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
 {
-	__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+	if (!has_vhe())
+		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
+
 	__debug_restore_state(vcpu, &vcpu->arch.host_debug_state.regs,
 			      kern_hyp_va(vcpu->arch.host_cpu_context));
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
-		vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 12/41] KVM: arm64: Improve debug register save/restore flow
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

Instead of having multiple calls from the world switch path to the debug
logic, each figuring out if the dirty bit is set and if we should
save/restore the debug registers, let's just provide two hooks to the
debug save/restore functionality, one for switching to the guest
context, and one for switching to the host context, and we get the
benefit of only having to evaluate the dirty flag once on each path,
plus we give the compiler some more room to inline some of this
functionality.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 10 ++-----
 arch/arm64/kvm/hyp/debug-sr.c    | 56 +++++++++++++++++++++++++++-------------
 arch/arm64/kvm/hyp/switch.c      |  6 ++---
 3 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 08d3bb66c8b7..a0e5a7038237 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -139,14 +139,8 @@ void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
-void __debug_save_state(struct kvm_vcpu *vcpu,
-			struct kvm_guest_debug_arch *dbg,
-			struct kvm_cpu_context *ctxt);
-void __debug_restore_state(struct kvm_vcpu *vcpu,
-			   struct kvm_guest_debug_arch *dbg,
-			   struct kvm_cpu_context *ctxt);
-void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
-void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
+void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
+void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 81b8ad44f9e0..ee87115eb12f 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -106,16 +106,13 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 	write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
 }
 
-void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-				   struct kvm_guest_debug_arch *dbg,
-				   struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
+					  struct kvm_guest_debug_arch *dbg,
+					  struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
 	int brps, wrps;
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-		return;
-
 	aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
 	brps = (aa64dfr0 >> 12) & 0xf;
 	wrps = (aa64dfr0 >> 20) & 0xf;
@@ -128,16 +125,13 @@ void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
 	ctxt->sys_regs[MDCCINT_EL1] = read_sysreg(mdccint_el1);
 }
 
-void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
-				      struct kvm_guest_debug_arch *dbg,
-				      struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
+					     struct kvm_guest_debug_arch *dbg,
+					     struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
 	int brps, wrps;
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-		return;
-
 	aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
 
 	brps = (aa64dfr0 >> 12) & 0xf;
@@ -151,10 +145,12 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 	write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
 }
 
-void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 {
-	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
-			   kern_hyp_va(vcpu->arch.host_cpu_context));
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	struct kvm_guest_debug_arch *host_dbg;
+	struct kvm_guest_debug_arch *guest_dbg;
 
 	/*
 	 * Non-VHE: Disable and flush SPE data generation
@@ -162,15 +158,39 @@ void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 	 */
 	if (!has_vhe())
 		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
+
+	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+		return;
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+	host_dbg = &vcpu->arch.host_debug_state.regs;
+	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
+
+	__debug_save_state(vcpu, host_dbg, host_ctxt);
+	__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
 }
 
-void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	struct kvm_guest_debug_arch *host_dbg;
+	struct kvm_guest_debug_arch *guest_dbg;
+
 	if (!has_vhe())
 		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
 
-	__debug_restore_state(vcpu, &vcpu->arch.host_debug_state.regs,
-			      kern_hyp_va(vcpu->arch.host_cpu_context));
+	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+		return;
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+	host_dbg = &vcpu->arch.host_debug_state.regs;
+	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
+
+	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
+	__debug_restore_state(vcpu, host_dbg, host_ctxt);
 
 	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 29e44a20f5e3..63284647ed11 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -283,7 +283,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	__sysreg_save_host_state(host_ctxt);
-	__debug_cond_save_host_state(vcpu);
 
 	__activate_traps(vcpu);
 	__activate_vm(vcpu);
@@ -297,7 +296,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_guest_state(guest_ctxt);
-	__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
+	__debug_switch_to_guest(vcpu);
 
 	/* Jump in the fire! */
 again:
@@ -375,12 +374,11 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_cond_restore_host_state(vcpu);
+	__debug_switch_to_host(vcpu);
 
 	return exit_code;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 12/41] KVM: arm64: Improve debug register save/restore flow
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Instead of having multiple calls from the world switch path to the debug
logic, each figuring out if the dirty bit is set and if we should
save/restore the debug registers, let's just provide two hooks to the
debug save/restore functionality, one for switching to the guest
context, and one for switching to the host context, and we get the
benefit of only having to evaluate the dirty flag once on each path,
plus we give the compiler some more room to inline some of this
functionality.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 10 ++-----
 arch/arm64/kvm/hyp/debug-sr.c    | 56 +++++++++++++++++++++++++++-------------
 arch/arm64/kvm/hyp/switch.c      |  6 ++---
 3 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 08d3bb66c8b7..a0e5a7038237 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -139,14 +139,8 @@ void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
-void __debug_save_state(struct kvm_vcpu *vcpu,
-			struct kvm_guest_debug_arch *dbg,
-			struct kvm_cpu_context *ctxt);
-void __debug_restore_state(struct kvm_vcpu *vcpu,
-			   struct kvm_guest_debug_arch *dbg,
-			   struct kvm_cpu_context *ctxt);
-void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
-void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
+void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
+void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 81b8ad44f9e0..ee87115eb12f 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -106,16 +106,13 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 	write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
 }
 
-void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-				   struct kvm_guest_debug_arch *dbg,
-				   struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
+					  struct kvm_guest_debug_arch *dbg,
+					  struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
 	int brps, wrps;
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-		return;
-
 	aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
 	brps = (aa64dfr0 >> 12) & 0xf;
 	wrps = (aa64dfr0 >> 20) & 0xf;
@@ -128,16 +125,13 @@ void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
 	ctxt->sys_regs[MDCCINT_EL1] = read_sysreg(mdccint_el1);
 }
 
-void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
-				      struct kvm_guest_debug_arch *dbg,
-				      struct kvm_cpu_context *ctxt)
+static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
+					     struct kvm_guest_debug_arch *dbg,
+					     struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
 	int brps, wrps;
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
-		return;
-
 	aa64dfr0 = read_sysreg(id_aa64dfr0_el1);
 
 	brps = (aa64dfr0 >> 12) & 0xf;
@@ -151,10 +145,12 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 	write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
 }
 
-void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 {
-	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
-			   kern_hyp_va(vcpu->arch.host_cpu_context));
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	struct kvm_guest_debug_arch *host_dbg;
+	struct kvm_guest_debug_arch *guest_dbg;
 
 	/*
 	 * Non-VHE: Disable and flush SPE data generation
@@ -162,15 +158,39 @@ void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 	 */
 	if (!has_vhe())
 		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
+
+	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+		return;
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+	host_dbg = &vcpu->arch.host_debug_state.regs;
+	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
+
+	__debug_save_state(vcpu, host_dbg, host_ctxt);
+	__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
 }
 
-void __hyp_text __debug_cond_restore_host_state(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	struct kvm_guest_debug_arch *host_dbg;
+	struct kvm_guest_debug_arch *guest_dbg;
+
 	if (!has_vhe())
 		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
 
-	__debug_restore_state(vcpu, &vcpu->arch.host_debug_state.regs,
-			      kern_hyp_va(vcpu->arch.host_cpu_context));
+	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+		return;
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+	host_dbg = &vcpu->arch.host_debug_state.regs;
+	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
+
+	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
+	__debug_restore_state(vcpu, host_dbg, host_ctxt);
 
 	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 29e44a20f5e3..63284647ed11 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -283,7 +283,6 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	__sysreg_save_host_state(host_ctxt);
-	__debug_cond_save_host_state(vcpu);
 
 	__activate_traps(vcpu);
 	__activate_vm(vcpu);
@@ -297,7 +296,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_guest_state(guest_ctxt);
-	__debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
+	__debug_switch_to_guest(vcpu);
 
 	/* Jump in the fire! */
 again:
@@ -375,12 +374,11 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_host_state(host_ctxt);
 
-	__debug_save_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt);
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_cond_restore_host_state(vcpu);
+	__debug_switch_to_host(vcpu);
 
 	return exit_code;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

The current world-switch function has functionality to detect a number
of cases where we need to fixup some part of the exit condition and
possibly run the guest again, before having restored the host state.

This includes populating missing fault info, emulating GICv2 CPU
interface accesses when mapped at unaligned addresses, and emulating
the GICv3 CPU interface on systems that need it.

As we are about to have an alternative switch function for VHE systems,
but VHE systems still need the same early fixup logic, factor out this
logic into a separate function that can be shared by both switch
functions.

No functional change.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 99 ++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 45 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 63284647ed11..55ca2e3d42eb 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -270,50 +270,24 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/*
+ * Return true when we were able to fixup the guest exit and should return to
+ * the guest, false when we should restore the host state and return to the
+ * main run loop.
+ */
+static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-	struct kvm_cpu_context *host_ctxt;
-	struct kvm_cpu_context *guest_ctxt;
-	u64 exit_code;
-
-	vcpu = kern_hyp_va(vcpu);
-
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	host_ctxt->__hyp_running_vcpu = vcpu;
-	guest_ctxt = &vcpu->arch.ctxt;
-
-	__sysreg_save_host_state(host_ctxt);
-
-	__activate_traps(vcpu);
-	__activate_vm(vcpu);
-
-	__vgic_restore_state(vcpu);
-	__timer_enable_traps(vcpu);
-
-	/*
-	 * We must restore the 32-bit state before the sysregs, thanks
-	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-	 */
-	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
-	__debug_switch_to_guest(vcpu);
-
-	/* Jump in the fire! */
-again:
-	exit_code = __guest_enter(vcpu, host_ctxt);
-	/* And we're baaack! */
-
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the
 	 * same PC once the SError has been injected, and replay the
 	 * trapping instruction.
 	 */
-	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
-		goto again;
+	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+		return true;
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
-	    exit_code == ARM_EXCEPTION_TRAP) {
+	    *exit_code == ARM_EXCEPTION_TRAP) {
 		bool valid;
 
 		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@@ -327,9 +301,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 			if (ret == 1) {
 				if (__skip_instr(vcpu))
-					goto again;
+					return true;
 				else
-					exit_code = ARM_EXCEPTION_TRAP;
+					*exit_code = ARM_EXCEPTION_TRAP;
 			}
 
 			if (ret == -1) {
@@ -341,29 +315,64 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 				 */
 				if (!__skip_instr(vcpu))
 					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
-				exit_code = ARM_EXCEPTION_EL1_SERROR;
+				*exit_code = ARM_EXCEPTION_EL1_SERROR;
 			}
-
-			/* 0 falls through to be handler out of EL2 */
 		}
 	}
 
 	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
-	    exit_code == ARM_EXCEPTION_TRAP &&
+	    *exit_code == ARM_EXCEPTION_TRAP &&
 	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
 		if (ret == 1) {
 			if (__skip_instr(vcpu))
-				goto again;
+				return true;
 			else
-				exit_code = ARM_EXCEPTION_TRAP;
+				*exit_code = ARM_EXCEPTION_TRAP;
 		}
-
-		/* 0 falls through to be handled out of EL2 */
 	}
 
+	/* Return to the host kernel and handle the exit */
+	return false;
+}
+
+int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	u64 exit_code;
+
+	vcpu = kern_hyp_va(vcpu);
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt->__hyp_running_vcpu = vcpu;
+	guest_ctxt = &vcpu->arch.ctxt;
+
+	__sysreg_save_host_state(host_ctxt);
+
+	__activate_traps(vcpu);
+	__activate_vm(vcpu);
+
+	__vgic_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
+
+	/*
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
+	__sysreg_restore_guest_state(guest_ctxt);
+	__debug_switch_to_guest(vcpu);
+
+	do {
+		/* Jump in the fire! */
+		exit_code = __guest_enter(vcpu, host_ctxt);
+
+		/* And we're baaack! */
+	} while (fixup_guest_exit(vcpu, &exit_code));
+
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The current world-switch function has functionality to detect a number
of cases where we need to fixup some part of the exit condition and
possibly run the guest again, before having restored the host state.

This includes populating missing fault info, emulating GICv2 CPU
interface accesses when mapped at unaligned addresses, and emulating
the GICv3 CPU interface on systems that need it.

As we are about to have an alternative switch function for VHE systems,
but VHE systems still need the same early fixup logic, factor out this
logic into a separate function that can be shared by both switch
functions.

No functional change.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 99 ++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 45 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 63284647ed11..55ca2e3d42eb 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -270,50 +270,24 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/*
+ * Return true when we were able to fixup the guest exit and should return to
+ * the guest, false when we should restore the host state and return to the
+ * main run loop.
+ */
+static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
-	struct kvm_cpu_context *host_ctxt;
-	struct kvm_cpu_context *guest_ctxt;
-	u64 exit_code;
-
-	vcpu = kern_hyp_va(vcpu);
-
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	host_ctxt->__hyp_running_vcpu = vcpu;
-	guest_ctxt = &vcpu->arch.ctxt;
-
-	__sysreg_save_host_state(host_ctxt);
-
-	__activate_traps(vcpu);
-	__activate_vm(vcpu);
-
-	__vgic_restore_state(vcpu);
-	__timer_enable_traps(vcpu);
-
-	/*
-	 * We must restore the 32-bit state before the sysregs, thanks
-	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-	 */
-	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
-	__debug_switch_to_guest(vcpu);
-
-	/* Jump in the fire! */
-again:
-	exit_code = __guest_enter(vcpu, host_ctxt);
-	/* And we're baaack! */
-
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the
 	 * same PC once the SError has been injected, and replay the
 	 * trapping instruction.
 	 */
-	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
-		goto again;
+	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+		return true;
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
-	    exit_code == ARM_EXCEPTION_TRAP) {
+	    *exit_code == ARM_EXCEPTION_TRAP) {
 		bool valid;
 
 		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@@ -327,9 +301,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 			if (ret == 1) {
 				if (__skip_instr(vcpu))
-					goto again;
+					return true;
 				else
-					exit_code = ARM_EXCEPTION_TRAP;
+					*exit_code = ARM_EXCEPTION_TRAP;
 			}
 
 			if (ret == -1) {
@@ -341,29 +315,64 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 				 */
 				if (!__skip_instr(vcpu))
 					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
-				exit_code = ARM_EXCEPTION_EL1_SERROR;
+				*exit_code = ARM_EXCEPTION_EL1_SERROR;
 			}
-
-			/* 0 falls through to be handler out of EL2 */
 		}
 	}
 
 	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
-	    exit_code == ARM_EXCEPTION_TRAP &&
+	    *exit_code == ARM_EXCEPTION_TRAP &&
 	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
 		if (ret == 1) {
 			if (__skip_instr(vcpu))
-				goto again;
+				return true;
 			else
-				exit_code = ARM_EXCEPTION_TRAP;
+				*exit_code = ARM_EXCEPTION_TRAP;
 		}
-
-		/* 0 falls through to be handled out of EL2 */
 	}
 
+	/* Return to the host kernel and handle the exit */
+	return false;
+}
+
+int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	u64 exit_code;
+
+	vcpu = kern_hyp_va(vcpu);
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt->__hyp_running_vcpu = vcpu;
+	guest_ctxt = &vcpu->arch.ctxt;
+
+	__sysreg_save_host_state(host_ctxt);
+
+	__activate_traps(vcpu);
+	__activate_vm(vcpu);
+
+	__vgic_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
+
+	/*
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
+	__sysreg_restore_guest_state(guest_ctxt);
+	__debug_switch_to_guest(vcpu);
+
+	do {
+		/* Jump in the fire! */
+		exit_code = __guest_enter(vcpu, host_ctxt);
+
+		/* And we're baaack! */
+	} while (fixup_guest_exit(vcpu, &exit_code));
+
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

So far this is just a copy of the legacy non-VHE switch function, but we
will start reworking these functions in separate directions to work on
VHE and non-VHE in the most optimal way in later patches.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h   |  5 +++-
 arch/arm/kvm/hyp/switch.c        |  2 +-
 arch/arm64/include/asm/kvm_asm.h |  4 ++-
 arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
 virt/kvm/arm/arm.c               |  5 +++-
 5 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 36dd2962a42d..4ac717276543 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+/* no VHE on 32-bit :( */
+static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index c3b9799e2e13..7b2bd25e3b10 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 	return true;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6c7599b5cb40..fb91e728207b 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_ich_vtr_el2(void);
 extern u64 __vgic_v3_read_vmcr(void);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 55ca2e3d42eb..accfe9a016f9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	return false;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/* Switch to the guest for VHE systems running in EL2 */
+int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	u64 exit_code;
+
+	vcpu = kern_hyp_va(vcpu);
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt->__hyp_running_vcpu = vcpu;
+	guest_ctxt = &vcpu->arch.ctxt;
+
+	__sysreg_save_host_state(host_ctxt);
+
+	__activate_traps(vcpu);
+	__activate_vm(vcpu);
+
+	__vgic_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
+
+	/*
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
+	__sysreg_restore_guest_state(guest_ctxt);
+	__debug_switch_to_guest(vcpu);
+
+	do {
+		/* Jump in the fire! */
+		exit_code = __guest_enter(vcpu, host_ctxt);
+
+		/* And we're baaack! */
+	} while (fixup_guest_exit(vcpu, &exit_code));
+
+	__sysreg_save_guest_state(guest_ctxt);
+	__sysreg32_save_state(vcpu);
+	__timer_disable_traps(vcpu);
+	__vgic_save_state(vcpu);
+
+	__deactivate_traps(vcpu);
+	__deactivate_vm(vcpu);
+
+	__sysreg_restore_host_state(host_ctxt);
+
+	/*
+	 * This must come after restoring the host sysregs, since a non-VHE
+	 * system may enable SPE here and make use of the TTBRs.
+	 */
+	__debug_switch_to_host(vcpu);
+
+	return exit_code;
+}
+
+/* Switch to the guest for legacy non-VHE systems */
+int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5b1487bd91e8..6bce8f9c55db 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		guest_enter_irqoff();
 
-		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		if (has_vhe())
+			ret = kvm_vcpu_run_vhe(vcpu);
+		else
+			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->stat.exits++;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

So far this is just a copy of the legacy non-VHE switch function, but we
will start reworking these functions in separate directions to work on
VHE and non-VHE in the most optimal way in later patches.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h   |  5 +++-
 arch/arm/kvm/hyp/switch.c        |  2 +-
 arch/arm64/include/asm/kvm_asm.h |  4 ++-
 arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
 virt/kvm/arm/arm.c               |  5 +++-
 5 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 36dd2962a42d..4ac717276543 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+/* no VHE on 32-bit :( */
+static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index c3b9799e2e13..7b2bd25e3b10 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 	return true;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 6c7599b5cb40..fb91e728207b 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
+
+extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_ich_vtr_el2(void);
 extern u64 __vgic_v3_read_vmcr(void);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 55ca2e3d42eb..accfe9a016f9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	return false;
 }
 
-int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
+/* Switch to the guest for VHE systems running in EL2 */
+int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *host_ctxt;
+	struct kvm_cpu_context *guest_ctxt;
+	u64 exit_code;
+
+	vcpu = kern_hyp_va(vcpu);
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt->__hyp_running_vcpu = vcpu;
+	guest_ctxt = &vcpu->arch.ctxt;
+
+	__sysreg_save_host_state(host_ctxt);
+
+	__activate_traps(vcpu);
+	__activate_vm(vcpu);
+
+	__vgic_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
+
+	/*
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
+	__sysreg_restore_guest_state(guest_ctxt);
+	__debug_switch_to_guest(vcpu);
+
+	do {
+		/* Jump in the fire! */
+		exit_code = __guest_enter(vcpu, host_ctxt);
+
+		/* And we're baaack! */
+	} while (fixup_guest_exit(vcpu, &exit_code));
+
+	__sysreg_save_guest_state(guest_ctxt);
+	__sysreg32_save_state(vcpu);
+	__timer_disable_traps(vcpu);
+	__vgic_save_state(vcpu);
+
+	__deactivate_traps(vcpu);
+	__deactivate_vm(vcpu);
+
+	__sysreg_restore_host_state(host_ctxt);
+
+	/*
+	 * This must come after restoring the host sysregs, since a non-VHE
+	 * system may enable SPE here and make use of the TTBRs.
+	 */
+	__debug_switch_to_host(vcpu);
+
+	return exit_code;
+}
+
+/* Switch to the guest for legacy non-VHE systems */
+int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5b1487bd91e8..6bce8f9c55db 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		guest_enter_irqoff();
 
-		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
+		if (has_vhe())
+			ret = kvm_vcpu_run_vhe(vcpu);
+		else
+			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		vcpu->stat.exits++;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

VHE kernels run completely in EL2 and therefore don't have a notion of
kernel and hyp addresses, they are all just kernel addresses.  Therefore
don't call kern_hyp_va() in the VHE switch function.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index accfe9a016f9..05fba76ec918 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *guest_ctxt;
 	u64 exit_code;
 
-	vcpu = kern_hyp_va(vcpu);
-
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt = vcpu->arch.host_cpu_context;
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

VHE kernels run completely in EL2 and therefore don't have a notion of
kernel and hyp addresses, they are all just kernel addresses.  Therefore
don't call kern_hyp_va() in the VHE switch function.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index accfe9a016f9..05fba76ec918 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *guest_ctxt;
 	u64 exit_code;
 
-	vcpu = kern_hyp_va(vcpu);
-
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	host_ctxt = vcpu->arch.host_cpu_context;
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 16/41] KVM: arm64: Don't deactivate VM on VHE systems
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There is no need to reset the VTTBR to zero when exiting the guest on
VHE systems.  VHE systems don't use stage 2 translations for the EL2&0
translation regime used by the host.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 05fba76ec918..9aadef6966bf 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -136,9 +136,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
+static void __hyp_text __activate_vm(struct kvm *kvm)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
 	write_sysreg(kvm->arch.vttbr, vttbr_el2);
 }
 
@@ -352,7 +351,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__sysreg_save_host_state(host_ctxt);
 
 	__activate_traps(vcpu);
-	__activate_vm(vcpu);
+	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
@@ -378,7 +377,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
-	__deactivate_vm(vcpu);
 
 	__sysreg_restore_host_state(host_ctxt);
 
@@ -407,7 +405,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__sysreg_save_host_state(host_ctxt);
 
 	__activate_traps(vcpu);
-	__activate_vm(vcpu);
+	__activate_vm(kern_hyp_va(vcpu->kvm));
 
 	__vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 16/41] KVM: arm64: Don't deactivate VM on VHE systems
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to reset the VTTBR to zero when exiting the guest on
VHE systems.  VHE systems don't use stage 2 translations for the EL2&0
translation regime used by the host.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 05fba76ec918..9aadef6966bf 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -136,9 +136,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
+static void __hyp_text __activate_vm(struct kvm *kvm)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
 	write_sysreg(kvm->arch.vttbr, vttbr_el2);
 }
 
@@ -352,7 +351,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__sysreg_save_host_state(host_ctxt);
 
 	__activate_traps(vcpu);
-	__activate_vm(vcpu);
+	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
@@ -378,7 +377,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
-	__deactivate_vm(vcpu);
 
 	__sysreg_restore_host_state(host_ctxt);
 
@@ -407,7 +405,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__sysreg_save_host_state(host_ctxt);
 
 	__activate_traps(vcpu);
-	__activate_vm(vcpu);
+	__activate_vm(kern_hyp_va(vcpu->kvm));
 
 	__vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

The VHE switch function calls __timer_enable_traps and
__timer_disable_traps which don't do anything on VHE systems.
Therefore, simply remove these calls from the VHE switch function and
make the functions non-conditional as they are now only called from the
non-VHE switch path.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c |  2 --
 virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9aadef6966bf..6175fcb33ed2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
-	__timer_enable_traps(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
-	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index f24404b3c8df..77754a62eb0c 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
 	write_sysreg(cntvoff, cntvoff_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * We don't need to do this for VHE since the host kernel runs in EL2
-	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
-	 */
-	if (!has_vhe()) {
-		u64 val;
+	u64 val;
 
-		/* Allow physical timer/counter access for the host */
-		val = read_sysreg(cnthctl_el2);
-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-		write_sysreg(val, cnthctl_el2);
-	}
+	/* Allow physical timer/counter access for the host */
+	val = read_sysreg(cnthctl_el2);
+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+	write_sysreg(val, cnthctl_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	if (!has_vhe()) {
-		u64 val;
+	u64 val;
 
-		/*
-		 * Disallow physical timer access for the guest
-		 * Physical counter access is allowed
-		 */
-		val = read_sysreg(cnthctl_el2);
-		val &= ~CNTHCTL_EL1PCEN;
-		val |= CNTHCTL_EL1PCTEN;
-		write_sysreg(val, cnthctl_el2);
-	}
+	/*
+	 * Disallow physical timer access for the guest
+	 * Physical counter access is allowed
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~CNTHCTL_EL1PCEN;
+	val |= CNTHCTL_EL1PCTEN;
+	write_sysreg(val, cnthctl_el2);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The VHE switch function calls __timer_enable_traps and
__timer_disable_traps which don't do anything on VHE systems.
Therefore, simply remove these calls from the VHE switch function and
make the functions non-conditional as they are now only called from the
non-VHE switch path.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c |  2 --
 virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9aadef6966bf..6175fcb33ed2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
-	__timer_enable_traps(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
-	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index f24404b3c8df..77754a62eb0c 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
 	write_sysreg(cntvoff, cntvoff_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * We don't need to do this for VHE since the host kernel runs in EL2
-	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
-	 */
-	if (!has_vhe()) {
-		u64 val;
+	u64 val;
 
-		/* Allow physical timer/counter access for the host */
-		val = read_sysreg(cnthctl_el2);
-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-		write_sysreg(val, cnthctl_el2);
-	}
+	/* Allow physical timer/counter access for the host */
+	val = read_sysreg(cnthctl_el2);
+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+	write_sysreg(val, cnthctl_el2);
 }
 
+/*
+ * Should only be called on non-VHE systems.
+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
+ */
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	if (!has_vhe()) {
-		u64 val;
+	u64 val;
 
-		/*
-		 * Disallow physical timer access for the guest
-		 * Physical counter access is allowed
-		 */
-		val = read_sysreg(cnthctl_el2);
-		val &= ~CNTHCTL_EL1PCEN;
-		val |= CNTHCTL_EL1PCTEN;
-		write_sysreg(val, cnthctl_el2);
-	}
+	/*
+	 * Disallow physical timer access for the guest
+	 * Physical counter access is allowed
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~CNTHCTL_EL1PCEN;
+	val |= CNTHCTL_EL1PCTEN;
+	write_sysreg(val, cnthctl_el2);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There's a semantic difference between the EL1 registers that control
operation of a kernel running in EL1 and EL1 registers that only control
userspace execution in EL0.  Since we can defer saving/restoring the
latter, move them into their own function.

We also take this chance to rename the function saving/restoring the
remaining system register to make it clear this function deals with
the EL1 system registers.

No functional change.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 848a46eb33bf..99dd50ce483b 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
-	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
-	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
 	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
+
+	/*
+	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
+	 * therefore be saved/restored on every entry/exit to/from the guest.
+	 */
 	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
 }
 
-static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
+	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt->sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
 	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
 	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
+	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
 	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
 	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
 	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
@@ -70,35 +79,46 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 }
 
 static hyp_alternate_select(__sysreg_call_save_host_state,
-			    __sysreg_save_state, __sysreg_do_nothing,
+			    __sysreg_save_el1_state, __sysreg_do_nothing,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_call_save_host_state()(ctxt);
 	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
 }
 
 void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_save_state(ctxt);
+	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
 }
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  actlr_el1);
-	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  tpidr_el0);
-	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
 	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
+
+	/*
+	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
+	 * therefore be saved/restored on every entry/exit to/from the guest.
+	 */
 	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
 }
 
-static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  	tpidr_el0);
+	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], 	tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
 	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
 	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
+	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
 	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
 	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
 	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
@@ -123,19 +143,21 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
 }
 
 static hyp_alternate_select(__sysreg_call_restore_host_state,
-			    __sysreg_restore_state, __sysreg_do_nothing,
+			    __sysreg_restore_el1_state, __sysreg_do_nothing,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_call_restore_host_state()(ctxt);
 	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
 }
 
 void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_state(ctxt);
+	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
 }
 
 static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There's a semantic difference between the EL1 registers that control
operation of a kernel running in EL1 and EL1 registers that only control
userspace execution in EL0.  Since we can defer saving/restoring the
latter, move them into their own function.

We also take this chance to rename the function saving/restoring the
remaining system register to make it clear this function deals with
the EL1 system registers.

No functional change.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 848a46eb33bf..99dd50ce483b 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
-	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
-	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
 	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
+
+	/*
+	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
+	 * therefore be saved/restored on every entry/exit to/from the guest.
+	 */
 	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
 }
 
-static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
+	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt->sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
 	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
 	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
+	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
 	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
 	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
 	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
@@ -70,35 +79,46 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 }
 
 static hyp_alternate_select(__sysreg_call_save_host_state,
-			    __sysreg_save_state, __sysreg_do_nothing,
+			    __sysreg_save_el1_state, __sysreg_do_nothing,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_call_save_host_state()(ctxt);
 	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
 }
 
 void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_save_state(ctxt);
+	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
 }
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  actlr_el1);
-	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  tpidr_el0);
-	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
 	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
+
+	/*
+	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
+	 * therefore be saved/restored on every entry/exit to/from the guest.
+	 */
 	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
 }
 
-static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  	tpidr_el0);
+	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], 	tpidrro_el0);
+}
+
+static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
 	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
 	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
+	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
 	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
 	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
 	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
@@ -123,19 +143,21 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
 }
 
 static hyp_alternate_select(__sysreg_call_restore_host_state,
-			    __sysreg_restore_state, __sysreg_do_nothing,
+			    __sysreg_restore_el1_state, __sysreg_do_nothing,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_call_restore_host_state()(ctxt);
 	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
 }
 
 void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_state(ctxt);
+	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
 }
 
 static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 19/41] KVM: arm64: Rewrite sysreg alternatives to static keys
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

As we are about to move calls around in the sysreg save/restore logic,
let's first rewrite the alternative function callers, because it is
going to make the next patches much easier to read.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 99dd50ce483b..72cdbc1f678b 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -22,9 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-/* Yes, this does nothing, on purpose */
-static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
-
 /*
  * Non-VHE: Both host and guest must save everything.
  *
@@ -78,13 +75,10 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-static hyp_alternate_select(__sysreg_call_save_host_state,
-			    __sysreg_save_el1_state, __sysreg_do_nothing,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_call_save_host_state()(ctxt);
+	if (!has_vhe())
+		__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
 }
@@ -142,13 +136,10 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-static hyp_alternate_select(__sysreg_call_restore_host_state,
-			    __sysreg_restore_el1_state, __sysreg_do_nothing,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_call_restore_host_state()(ctxt);
+	if (!has_vhe())
+		__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 19/41] KVM: arm64: Rewrite sysreg alternatives to static keys
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to move calls around in the sysreg save/restore logic,
let's first rewrite the alternative function callers, because it is
going to make the next patches much easier to read.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 99dd50ce483b..72cdbc1f678b 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -22,9 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-/* Yes, this does nothing, on purpose */
-static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
-
 /*
  * Non-VHE: Both host and guest must save everything.
  *
@@ -78,13 +75,10 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-static hyp_alternate_select(__sysreg_call_save_host_state,
-			    __sysreg_save_el1_state, __sysreg_do_nothing,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_call_save_host_state()(ctxt);
+	if (!has_vhe())
+		__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
 }
@@ -142,13 +136,10 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-static hyp_alternate_select(__sysreg_call_restore_host_state,
-			    __sysreg_restore_el1_state, __sysreg_do_nothing,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_call_restore_host_state()(ctxt);
+	if (!has_vhe())
+		__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 20/41] KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore functions
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

As we are about to handle system registers quite differently between VHE
and non-VHE systems.  In preparation for that, we need to split some of
the handling functions between VHE and non-VHE functionality.

For now, we simply copy the non-VHE functions, but we do change the use
of static keys for VHE and non-VHE functionality now that we have
separate functions.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 12 ++++++++----
 arch/arm64/kvm/hyp/switch.c      | 20 ++++++++++----------
 arch/arm64/kvm/hyp/sysreg-sr.c   | 40 ++++++++++++++++++++++++++++++++--------
 3 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index a0e5a7038237..998152da9b66 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -132,10 +132,14 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6175fcb33ed2..42e0123ecd69 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -348,7 +348,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state(host_ctxt);
+	sysreg_save_host_state_vhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(vcpu->kvm);
@@ -360,7 +360,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
+	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -370,13 +370,13 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state(guest_ctxt);
+	sysreg_save_guest_state_vhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
 
-	__sysreg_restore_host_state(host_ctxt);
+	sysreg_restore_host_state_vhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -400,7 +400,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state(host_ctxt);
+	__sysreg_save_host_state_nvhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -413,7 +413,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
+	__sysreg_restore_guest_state_nvhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -423,7 +423,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state(guest_ctxt);
+	__sysreg_save_guest_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
@@ -431,7 +431,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__deactivate_traps(vcpu);
 	__deactivate_vm(vcpu);
 
-	__sysreg_restore_host_state(host_ctxt);
+	__sysreg_restore_host_state_nvhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -456,7 +456,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state(__host_ctxt);
+		__sysreg_restore_host_state_nvhe(__host_ctxt);
 	}
 
 	/*
@@ -479,7 +479,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
 	__deactivate_traps(vcpu);
-	__sysreg_restore_host_state(host_ctxt);
+	sysreg_restore_host_state_vhe(host_ctxt);
 
 	panic(__hyp_panic_string,
 	      spsr,  elr,
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 72cdbc1f678b..5cbde1016303 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -75,15 +75,27 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_save_el1_state(ctxt);
+	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
+}
+
+void __hyp_text __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_save_el1_state(ctxt);
+	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
+}
+
+void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	if (!has_vhe())
-		__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
 }
 
-void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
+void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
@@ -136,15 +148,27 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
+}
+
+void __hyp_text __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
+}
+
+void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	if (!has_vhe())
-		__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 }
 
-void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
+void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 20/41] KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore functions
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to handle system registers quite differently between VHE
and non-VHE systems.  In preparation for that, we need to split some of
the handling functions between VHE and non-VHE functionality.

For now, we simply copy the non-VHE functions, but we do change the use
of static keys for VHE and non-VHE functionality now that we have
separate functions.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 12 ++++++++----
 arch/arm64/kvm/hyp/switch.c      | 20 ++++++++++----------
 arch/arm64/kvm/hyp/sysreg-sr.c   | 40 ++++++++++++++++++++++++++++++++--------
 3 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index a0e5a7038237..998152da9b66 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -132,10 +132,14 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
+void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6175fcb33ed2..42e0123ecd69 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -348,7 +348,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state(host_ctxt);
+	sysreg_save_host_state_vhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(vcpu->kvm);
@@ -360,7 +360,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
+	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -370,13 +370,13 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state(guest_ctxt);
+	sysreg_save_guest_state_vhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
 
-	__sysreg_restore_host_state(host_ctxt);
+	sysreg_restore_host_state_vhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -400,7 +400,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state(host_ctxt);
+	__sysreg_save_host_state_nvhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -413,7 +413,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state(guest_ctxt);
+	__sysreg_restore_guest_state_nvhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -423,7 +423,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state(guest_ctxt);
+	__sysreg_save_guest_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
@@ -431,7 +431,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__deactivate_traps(vcpu);
 	__deactivate_vm(vcpu);
 
-	__sysreg_restore_host_state(host_ctxt);
+	__sysreg_restore_host_state_nvhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -456,7 +456,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state(__host_ctxt);
+		__sysreg_restore_host_state_nvhe(__host_ctxt);
 	}
 
 	/*
@@ -479,7 +479,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
 	__deactivate_traps(vcpu);
-	__sysreg_restore_host_state(host_ctxt);
+	sysreg_restore_host_state_vhe(host_ctxt);
 
 	panic(__hyp_panic_string,
 	      spsr,  elr,
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 72cdbc1f678b..5cbde1016303 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -75,15 +75,27 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_save_el1_state(ctxt);
+	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
+}
+
+void __hyp_text __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_save_el1_state(ctxt);
+	__sysreg_save_common_state(ctxt);
+	__sysreg_save_user_state(ctxt);
+}
+
+void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	if (!has_vhe())
-		__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
 }
 
-void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
+void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
@@ -136,15 +148,27 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
+}
+
+void __hyp_text __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+{
+	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_common_state(ctxt);
+	__sysreg_restore_user_state(ctxt);
+}
+
+void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	if (!has_vhe())
-		__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 }
 
-void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt)
+void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 21/41] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

The comment only applied to SPE on non-VHE systems, so we simply remove
it.

Suggested-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 42e0123ecd69..b6edb6aaa298 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -378,10 +378,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	/*
-	 * This must come after restoring the host sysregs, since a non-VHE
-	 * system may enable SPE here and make use of the TTBRs.
-	 */
 	__debug_switch_to_host(vcpu);
 
 	return exit_code;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 21/41] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The comment only applied to SPE on non-VHE systems, so we simply remove
it.

Suggested-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 42e0123ecd69..b6edb6aaa298 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -378,10 +378,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	/*
-	 * This must come after restoring the host sysregs, since a non-VHE
-	 * system may enable SPE here and make use of the TTBRs.
-	 */
 	__debug_switch_to_host(vcpu);
 
 	return exit_code;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 22/41] KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There is no need to have multiple identical functions with different
names for saving host and guest state.  When saving and restoring state
for the host and guest, the state is the same for both contexts, and
that's why we have the kvm_cpu_context structure.  Delete one
version and rename the other into simply save/restore.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  6 ++----
 arch/arm64/kvm/hyp/switch.c      | 10 +++++-----
 arch/arm64/kvm/hyp/sysreg-sr.c   | 18 ++----------------
 3 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 998152da9b66..3f54c55f77a1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -132,10 +132,8 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b6edb6aaa298..2e04d404ac82 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -396,7 +396,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state_nvhe(host_ctxt);
+	__sysreg_save_state_nvhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -409,7 +409,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state_nvhe(guest_ctxt);
+	__sysreg_restore_state_nvhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -419,7 +419,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state_nvhe(guest_ctxt);
+	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
@@ -427,7 +427,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__deactivate_traps(vcpu);
 	__deactivate_vm(vcpu);
 
-	__sysreg_restore_host_state_nvhe(host_ctxt);
+	__sysreg_restore_state_nvhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -452,7 +452,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state_nvhe(__host_ctxt);
+		__sysreg_restore_state_nvhe(__host_ctxt);
 	}
 
 	/*
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 5cbde1016303..baa243a010b3 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -75,14 +75,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-void __hyp_text __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-	__sysreg_save_el1_state(ctxt);
-	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
@@ -148,14 +141,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-void __hyp_text __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-	__sysreg_restore_el1_state(ctxt);
-	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 22/41] KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to have multiple identical functions with different
names for saving host and guest state.  When saving and restoring state
for the host and guest, the state is the same for both contexts, and
that's why we have the kvm_cpu_context structure.  Delete one
version and rename the other into simply save/restore.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  6 ++----
 arch/arm64/kvm/hyp/switch.c      | 10 +++++-----
 arch/arm64/kvm/hyp/sysreg-sr.c   | 18 ++----------------
 3 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 998152da9b66..3f54c55f77a1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -132,10 +132,8 @@ int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
-void __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt);
-void __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt);
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b6edb6aaa298..2e04d404ac82 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -396,7 +396,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	__sysreg_save_host_state_nvhe(host_ctxt);
+	__sysreg_save_state_nvhe(host_ctxt);
 
 	__activate_traps(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
@@ -409,7 +409,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
 	 */
 	__sysreg32_restore_state(vcpu);
-	__sysreg_restore_guest_state_nvhe(guest_ctxt);
+	__sysreg_restore_state_nvhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -419,7 +419,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	__sysreg_save_guest_state_nvhe(guest_ctxt);
+	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
@@ -427,7 +427,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__deactivate_traps(vcpu);
 	__deactivate_vm(vcpu);
 
-	__sysreg_restore_host_state_nvhe(host_ctxt);
+	__sysreg_restore_state_nvhe(host_ctxt);
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
@@ -452,7 +452,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
-		__sysreg_restore_host_state_nvhe(__host_ctxt);
+		__sysreg_restore_state_nvhe(__host_ctxt);
 	}
 
 	/*
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 5cbde1016303..baa243a010b3 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -75,14 +75,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
 
-void __hyp_text __sysreg_save_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-	__sysreg_save_el1_state(ctxt);
-	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_save_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
@@ -148,14 +141,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
 
-void __hyp_text __sysreg_restore_host_state_nvhe(struct kvm_cpu_context *ctxt)
-{
-	__sysreg_restore_el1_state(ctxt);
-	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
-}
-
-void __hyp_text __sysreg_restore_guest_state_nvhe(struct kvm_cpu_context *ctxt)
+void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 23/41] KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

On non-VHE systems we need to save the ELR_EL2 and SPSR_EL2 so that we can
return to the host in EL1 in the same state and location where we issued a
hypercall to EL2, but on VHE ELR_EL2 and SPSR_EL2 are not useful because we
never enter a guest as a result of an exception entry that would be directly
handled by KVM. The kernel entry code already saves ELR_EL1/SPSR_EL1 on
exception entry, which is enough.  Therefore, factor out these registers into
separate save/restore functions, making it easy to exclude them from the VHE
world-switch path later on.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index baa243a010b3..1f2d5e9343b0 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -71,6 +71,10 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
 	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
 	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+}
+
+static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
+{
 	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
@@ -80,6 +84,7 @@ void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
+	__sysreg_save_el2_return_state(ctxt);
 }
 
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
@@ -93,6 +98,7 @@ void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
+	__sysreg_save_el2_return_state(ctxt);
 }
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
@@ -137,6 +143,11 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
 	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
 	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+}
+
+static void __hyp_text
+__sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
+{
 	write_sysreg_el2(ctxt->gp_regs.regs.pc,		elr);
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
@@ -146,6 +157,7 @@ void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
+	__sysreg_restore_el2_return_state(ctxt);
 }
 
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
@@ -159,6 +171,7 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
+	__sysreg_restore_el2_return_state(ctxt);
 }
 
 static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 23/41] KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On non-VHE systems we need to save the ELR_EL2 and SPSR_EL2 so that we can
return to the host in EL1 in the same state and location where we issued a
hypercall to EL2, but on VHE ELR_EL2 and SPSR_EL2 are not useful because we
never enter a guest as a result of an exception entry that would be directly
handled by KVM. The kernel entry code already saves ELR_EL1/SPSR_EL1 on
exception entry, which is enough.  Therefore, factor out these registers into
separate save/restore functions, making it easy to exclude them from the VHE
world-switch path later on.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index baa243a010b3..1f2d5e9343b0 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -71,6 +71,10 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
 	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
 	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+}
+
+static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
+{
 	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
 }
@@ -80,6 +84,7 @@ void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
+	__sysreg_save_el2_return_state(ctxt);
 }
 
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
@@ -93,6 +98,7 @@ void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
 	__sysreg_save_user_state(ctxt);
+	__sysreg_save_el2_return_state(ctxt);
 }
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
@@ -137,6 +143,11 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
 	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
 	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+}
+
+static void __hyp_text
+__sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
+{
 	write_sysreg_el2(ctxt->gp_regs.regs.pc,		elr);
 	write_sysreg_el2(ctxt->gp_regs.regs.pstate,	spsr);
 }
@@ -146,6 +157,7 @@ void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
+	__sysreg_restore_el2_return_state(ctxt);
 }
 
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
@@ -159,6 +171,7 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
+	__sysreg_restore_el2_return_state(ctxt);
 }
 
 static void __hyp_text __fpsimd32_save_state(struct kvm_cpu_context *ctxt)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 24/41] KVM: arm64: Change 32-bit handling of VM system registers
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.

Since all the 32-bit coproc indices are created to correspond to the
architectural mapping between 64-bit system registers and 32-bit
coprocessor registers, and because the AArch64 system registers are the
double in size of the AArch32 coprocessor registers, we can always find
the system register entry that we must update by dividing the 32-bit
coproc index by 2.

This is going to make our lives much easier when we have to start
accessing system registers that use deferred save/restore and might
have to be read directly from the physical CPU.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |  8 --------
 arch/arm64/kvm/sys_regs.c         | 20 +++++++++++++++-----
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e23bc968668..4a1b0bab6baf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -292,14 +292,6 @@ struct kvm_vcpu_arch {
 #define vcpu_cp14(v,r)		((v)->arch.ctxt.copro[(r)])
 #define vcpu_cp15(v,r)		((v)->arch.ctxt.copro[(r)])
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define vcpu_cp15_64_high(v,r)	vcpu_cp15((v),(r))
-#define vcpu_cp15_64_low(v,r)	vcpu_cp15((v),(r) + 1)
-#else
-#define vcpu_cp15_64_high(v,r)	vcpu_cp15((v),(r) + 1)
-#define vcpu_cp15_64_low(v,r)	vcpu_cp15((v),(r))
-#endif
-
 struct kvm_vm_stat {
 	ulong remote_tlb_flush;
 };
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1830ebc227d1..b20c26e0d6b9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,16 +121,26 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_desc *r)
 {
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
+	u64 val;
+	int reg = r->reg;
 
 	BUG_ON(!p->is_write);
 
-	if (!p->is_aarch32) {
-		vcpu_sys_reg(vcpu, r->reg) = p->regval;
+	/* See the 32bit mapping in kvm_host.h */
+	if (p->is_aarch32)
+		reg = r->reg / 2;
+
+	if (!p->is_aarch32 || !p->is_32bit) {
+		val = p->regval;
 	} else {
-		if (!p->is_32bit)
-			vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
-		vcpu_cp15_64_low(vcpu, r->reg) = lower_32_bits(p->regval);
+		val = vcpu_sys_reg(vcpu, reg);
+		if (r->reg % 2)
+			val = (p->regval << 32) | (u64)lower_32_bits(val);
+		else
+			val = ((u64)upper_32_bits(val) << 32) |
+				lower_32_bits(p->regval);
 	}
+	vcpu_sys_reg(vcpu, reg) = val;
 
 	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 24/41] KVM: arm64: Change 32-bit handling of VM system registers
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.

Since all the 32-bit coproc indices are created to correspond to the
architectural mapping between 64-bit system registers and 32-bit
coprocessor registers, and because the AArch64 system registers are the
double in size of the AArch32 coprocessor registers, we can always find
the system register entry that we must update by dividing the 32-bit
coproc index by 2.

This is going to make our lives much easier when we have to start
accessing system registers that use deferred save/restore and might
have to be read directly from the physical CPU.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |  8 --------
 arch/arm64/kvm/sys_regs.c         | 20 +++++++++++++++-----
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e23bc968668..4a1b0bab6baf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -292,14 +292,6 @@ struct kvm_vcpu_arch {
 #define vcpu_cp14(v,r)		((v)->arch.ctxt.copro[(r)])
 #define vcpu_cp15(v,r)		((v)->arch.ctxt.copro[(r)])
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define vcpu_cp15_64_high(v,r)	vcpu_cp15((v),(r))
-#define vcpu_cp15_64_low(v,r)	vcpu_cp15((v),(r) + 1)
-#else
-#define vcpu_cp15_64_high(v,r)	vcpu_cp15((v),(r) + 1)
-#define vcpu_cp15_64_low(v,r)	vcpu_cp15((v),(r))
-#endif
-
 struct kvm_vm_stat {
 	ulong remote_tlb_flush;
 };
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1830ebc227d1..b20c26e0d6b9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,16 +121,26 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_desc *r)
 {
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
+	u64 val;
+	int reg = r->reg;
 
 	BUG_ON(!p->is_write);
 
-	if (!p->is_aarch32) {
-		vcpu_sys_reg(vcpu, r->reg) = p->regval;
+	/* See the 32bit mapping in kvm_host.h */
+	if (p->is_aarch32)
+		reg = r->reg / 2;
+
+	if (!p->is_aarch32 || !p->is_32bit) {
+		val = p->regval;
 	} else {
-		if (!p->is_32bit)
-			vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
-		vcpu_cp15_64_low(vcpu, r->reg) = lower_32_bits(p->regval);
+		val = vcpu_sys_reg(vcpu, reg);
+		if (r->reg % 2)
+			val = (p->regval << 32) | (u64)lower_32_bits(val);
+		else
+			val = ((u64)upper_32_bits(val) << 32) |
+				lower_32_bits(p->regval);
 	}
+	vcpu_sys_reg(vcpu, reg) = val;
 
 	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 25/41] KVM: arm64: Rewrite system register accessors to read/write functions
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Christoffer Dall, Shih-Wei Li, kvm

From: Christoffer Dall <cdall@cs.columbia.edu>

Currently we access the system registers array via the vcpu_sys_reg()
macro.  However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:

 * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
 * Direct array access macro __vcpu_sys_reg()

The first primitive should be used in places where the code needs to
access the currently loaded VCPU's state as observed by the guest.  For
example, when trapping on cache related registers, a write to a system
register should go directly to the VCPU version of the register.

The second primitive can be used in places where the VCPU is known to
never be running (for example userspace access) or for registers which
are never context switched (for example all the PMU system registers).

This rewrites all users of vcpu_sys_regs to one of the two primitives
above.

No functional change.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_emulate.h | 13 ++++---
 arch/arm64/include/asm/kvm_host.h    | 13 ++++++-
 arch/arm64/include/asm/kvm_mmu.h     |  2 +-
 arch/arm64/kvm/debug.c               | 27 +++++++++-----
 arch/arm64/kvm/inject_fault.c        |  8 ++--
 arch/arm64/kvm/sys_regs.c            | 71 ++++++++++++++++++------------------
 arch/arm64/kvm/sys_regs.h            |  4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c |  4 +-
 virt/kvm/arm/pmu.c                   | 37 ++++++++++---------
 9 files changed, 102 insertions(+), 77 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e07bf463ac58..df1cb146750d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -273,15 +273,18 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 {
-	if (vcpu_mode_is_32bit(vcpu))
+	if (vcpu_mode_is_32bit(vcpu)) {
 		*vcpu_cpsr(vcpu) |= COMPAT_PSR_E_BIT;
-	else
-		vcpu_sys_reg(vcpu, SCTLR_EL1) |= (1 << 25);
+	} else {
+		u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		sctlr |= (1 << 25);
+		vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr);
+	}
 }
 
 static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
@@ -289,7 +292,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
 	if (vcpu_mode_is_32bit(vcpu))
 		return !!(*vcpu_cpsr(vcpu) & COMPAT_PSR_E_BIT);
 
-	return !!(vcpu_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
+	return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
 }
 
 static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4a1b0bab6baf..91272c35cc36 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -284,7 +284,18 @@ struct kvm_vcpu_arch {
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
-#define vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+
+/*
+ * Only use __vcpu_sys_reg if you know you want the memory backed version of a
+ * register, and not the one most recently accessed by a runnning VCPU.  For
+ * example, for userpace access or for system registers that are never context
+ * switched, but only emulated.
+ */
+#define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+
+#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
+#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
+
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
  * same system registers.
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..c50fcfed0165 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -227,7 +227,7 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index feedb877cff8..db32d10a56a1 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -46,7 +46,8 @@ static DEFINE_PER_CPU(u32, mdcr_el2);
  */
 static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, MDSCR_EL1);
+	vcpu->arch.guest_debug_preserved.mdscr_el1 =
+		vcpu_read_sys_reg(vcpu, MDSCR_EL1);
 
 	trace_kvm_arm_set_dreg32("Saved MDSCR_EL1",
 				vcpu->arch.guest_debug_preserved.mdscr_el1);
@@ -54,10 +55,11 @@ static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 
 static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	vcpu_sys_reg(vcpu, MDSCR_EL1) = vcpu->arch.guest_debug_preserved.mdscr_el1;
+	vcpu_write_sys_reg(vcpu, MDSCR_EL1,
+			   vcpu->arch.guest_debug_preserved.mdscr_el1);
 
 	trace_kvm_arm_set_dreg32("Restored MDSCR_EL1",
-				vcpu_sys_reg(vcpu, MDSCR_EL1));
+				vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
 /**
@@ -108,6 +110,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
 	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
+	unsigned long mdscr;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
 
@@ -152,9 +155,13 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		 */
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			*vcpu_cpsr(vcpu) |=  DBG_SPSR_SS;
-			vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_SS;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr |= DBG_MDSCR_SS;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 		} else {
-			vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~DBG_MDSCR_SS;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr &= ~DBG_MDSCR_SS;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 		}
 
 		trace_kvm_arm_set_dreg32("SPSR_EL2", *vcpu_cpsr(vcpu));
@@ -170,7 +177,9 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		 */
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) {
 			/* Enable breakpoints/watchpoints */
-			vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_MDE;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr |= DBG_MDSCR_MDE;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
 			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
@@ -194,12 +203,12 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
-	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
-	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
+	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
 		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
-	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
+	trace_kvm_arm_set_dreg32("MDSCR_EL1",
+				 vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index f4d35bb551e4..1e070943e7a6 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -58,7 +58,7 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
 		exc_offset = LOWER_EL_AArch32_VECTOR;
 	}
 
-	return vcpu_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
+	return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
 }
 
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
@@ -73,7 +73,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
 	*vcpu_spsr(vcpu) = cpsr;
 
-	vcpu_sys_reg(vcpu, FAR_EL1) = addr;
+	vcpu_write_sys_reg(vcpu, FAR_EL1, addr);
 
 	/*
 	 * Build an {i,d}abort, depending on the level and the
@@ -94,7 +94,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
+	vcpu_write_sys_reg(vcpu, ESR_EL1, esr | ESR_ELx_FSC_EXTABT);
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
@@ -115,7 +115,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_trap_il_is32bit(vcpu))
 		esr |= ESR_ELx_IL;
 
-	vcpu_sys_reg(vcpu, ESR_EL1) = esr;
+	vcpu_write_sys_reg(vcpu, ESR_EL1, esr);
 }
 
 /**
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b20c26e0d6b9..96398d53b462 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -133,14 +133,14 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (!p->is_aarch32 || !p->is_32bit) {
 		val = p->regval;
 	} else {
-		val = vcpu_sys_reg(vcpu, reg);
+		val = vcpu_read_sys_reg(vcpu, reg);
 		if (r->reg % 2)
 			val = (p->regval << 32) | (u64)lower_32_bits(val);
 		else
 			val = ((u64)upper_32_bits(val) << 32) |
 				lower_32_bits(p->regval);
 	}
-	vcpu_sys_reg(vcpu, reg) = val;
+	vcpu_write_sys_reg(vcpu, reg, val);
 
 	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
@@ -241,10 +241,10 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    const struct sys_reg_desc *r)
 {
 	if (p->is_write) {
-		vcpu_sys_reg(vcpu, r->reg) = p->regval;
+		vcpu_write_sys_reg(vcpu, r->reg, p->regval);
 		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, r->reg);
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
 	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
@@ -457,7 +457,8 @@ static void reset_wcr(struct kvm_vcpu *vcpu,
 
 static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	vcpu_sys_reg(vcpu, AMAIR_EL1) = read_sysreg(amair_el1);
+	u64 amair = read_sysreg(amair_el1);
+	vcpu_write_sys_reg(vcpu, AMAIR_EL1, amair);
 }
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -474,7 +475,7 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
+	vcpu_write_sys_reg(vcpu, MPIDR_EL1, (1ULL << 31) | mpidr);
 }
 
 static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -488,12 +489,12 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	 */
 	val = ((pmcr & ~ARMV8_PMU_PMCR_MASK)
 	       | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
-	vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+	__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
 }
 
 static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
 {
-	u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+	u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
 	bool enabled = (reg & flags) || vcpu_mode_priv(vcpu);
 
 	if (!enabled)
@@ -535,14 +536,14 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (p->is_write) {
 		/* Only update writeable bits of PMCR */
-		val = vcpu_sys_reg(vcpu, PMCR_EL0);
+		val = __vcpu_sys_reg(vcpu, PMCR_EL0);
 		val &= ~ARMV8_PMU_PMCR_MASK;
 		val |= p->regval & ARMV8_PMU_PMCR_MASK;
-		vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+		__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
 		kvm_pmu_handle_pmcr(vcpu, val);
 	} else {
 		/* PMCR.P & PMCR.C are RAZ */
-		val = vcpu_sys_reg(vcpu, PMCR_EL0)
+		val = __vcpu_sys_reg(vcpu, PMCR_EL0)
 		      & ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
 		p->regval = val;
 	}
@@ -560,10 +561,10 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		return false;
 
 	if (p->is_write)
-		vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
+		__vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
 	else
 		/* return PMSELR.SEL field */
-		p->regval = vcpu_sys_reg(vcpu, PMSELR_EL0)
+		p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
 			    & ARMV8_PMU_COUNTER_MASK;
 
 	return true;
@@ -596,7 +597,7 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
 {
 	u64 pmcr, val;
 
-	pmcr = vcpu_sys_reg(vcpu, PMCR_EL0);
+	pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
 	val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
 	if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) {
 		kvm_inject_undefined(vcpu);
@@ -621,7 +622,7 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
 			if (pmu_access_event_counter_el0_disabled(vcpu))
 				return false;
 
-			idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
+			idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
 			      & ARMV8_PMU_COUNTER_MASK;
 		} else if (r->Op2 == 0) {
 			/* PMCCNTR_EL0 */
@@ -676,7 +677,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
 		/* PMXEVTYPER_EL0 */
-		idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
+		idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
 		reg = PMEVTYPER0_EL0 + idx;
 	} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
 		idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
@@ -694,9 +695,9 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (p->is_write) {
 		kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
-		vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
+		__vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
+		p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
 	}
 
 	return true;
@@ -718,15 +719,15 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		val = p->regval & mask;
 		if (r->Op2 & 0x1) {
 			/* accessing PMCNTENSET_EL0 */
-			vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
+			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
 			kvm_pmu_enable_counter(vcpu, val);
 		} else {
 			/* accessing PMCNTENCLR_EL0 */
-			vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
+			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
 			kvm_pmu_disable_counter(vcpu, val);
 		}
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
 	}
 
 	return true;
@@ -750,12 +751,12 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 		if (r->Op2 & 0x1)
 			/* accessing PMINTENSET_EL1 */
-			vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
+			__vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
 		else
 			/* accessing PMINTENCLR_EL1 */
-			vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
+			__vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
 	}
 
 	return true;
@@ -775,12 +776,12 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (p->is_write) {
 		if (r->CRm & 0x2)
 			/* accessing PMOVSSET_EL0 */
-			vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
+			__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
 		else
 			/* accessing PMOVSCLR_EL0 */
-			vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
+			__vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
 	}
 
 	return true;
@@ -817,10 +818,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			return false;
 		}
 
-		vcpu_sys_reg(vcpu, PMUSERENR_EL0) = p->regval
-						    & ARMV8_PMU_USERENR_MASK;
-	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMUSERENR_EL0)
+		__vcpu_sys_reg(vcpu, PMUSERENR_EL0) =
+			       p->regval & ARMV8_PMU_USERENR_MASK;
+	} else  {
+		p->regval = __vcpu_sys_reg(vcpu, PMUSERENR_EL0)
 			    & ARMV8_PMU_USERENR_MASK;
 	}
 
@@ -2193,7 +2194,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 	if (r->get_user)
 		return (r->get_user)(vcpu, r, reg, uaddr);
 
-	return reg_to_user(uaddr, &vcpu_sys_reg(vcpu, r->reg), reg->id);
+	return reg_to_user(uaddr, &__vcpu_sys_reg(vcpu, r->reg), reg->id);
 }
 
 int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -2214,7 +2215,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 	if (r->set_user)
 		return (r->set_user)(vcpu, r, reg, uaddr);
 
-	return reg_from_user(&vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
+	return reg_from_user(&__vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
 }
 
 static unsigned int num_demux_regs(void)
@@ -2420,6 +2421,6 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 	reset_sys_reg_descs(vcpu, table, num);
 
 	for (num = 1; num < NR_SYS_REGS; num++)
-		if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
-			panic("Didn't reset vcpu_sys_reg(%zi)", num);
+		if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
+			panic("Didn't reset __vcpu_sys_reg(%zi)", num);
 }
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 060f5348ef25..cd710f8b63e0 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -89,14 +89,14 @@ static inline void reset_unknown(struct kvm_vcpu *vcpu,
 {
 	BUG_ON(!r->reg);
 	BUG_ON(r->reg >= NR_SYS_REGS);
-	vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
+	__vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
 }
 
 static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
 	BUG_ON(!r->reg);
 	BUG_ON(r->reg >= NR_SYS_REGS);
-	vcpu_sys_reg(vcpu, r->reg) = r->val;
+	__vcpu_sys_reg(vcpu, r->reg) = r->val;
 }
 
 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c b/arch/arm64/kvm/sys_regs_generic_v8.c
index 969ade1d333d..ddb8497d18d6 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -38,13 +38,13 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu_sys_reg(vcpu, ACTLR_EL1);
+	p->regval = vcpu_read_sys_reg(vcpu, ACTLR_EL1);
 	return true;
 }
 
 static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
+	__vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
 }
 
 /*
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 8a9c42366db7..29cb4a1ff26b 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -37,7 +37,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
 
 	reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
 	      ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
-	counter = vcpu_sys_reg(vcpu, reg);
+	counter = __vcpu_sys_reg(vcpu, reg);
 
 	/* The real counter value is equal to the value of counter register plus
 	 * the value perf event counts.
@@ -61,7 +61,8 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val)
 
 	reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
 	      ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
-	vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx);
+	__vcpu_sys_reg(vcpu, reg) +=
+		(s64)val - kvm_pmu_get_counter_value(vcpu, select_idx);
 }
 
 /**
@@ -78,7 +79,7 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
 		counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
 		reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
 		       ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
-		vcpu_sys_reg(vcpu, reg) = counter;
+		__vcpu_sys_reg(vcpu, reg) = counter;
 		perf_event_disable(pmc->perf_event);
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
@@ -125,7 +126,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 {
-	u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMU_PMCR_N_SHIFT;
+	u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMU_PMCR_N_SHIFT;
 
 	val &= ARMV8_PMU_PMCR_N_MASK;
 	if (val == 0)
@@ -147,7 +148,7 @@ void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val)
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
 	struct kvm_pmc *pmc;
 
-	if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val)
+	if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val)
 		return;
 
 	for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
@@ -193,10 +194,10 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 {
 	u64 reg = 0;
 
-	if ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) {
-		reg = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
-		reg &= vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
-		reg &= vcpu_sys_reg(vcpu, PMINTENSET_EL1);
+	if ((__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) {
+		reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
+		reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+		reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
 		reg &= kvm_pmu_valid_counter_mask(vcpu);
 	}
 
@@ -295,7 +296,7 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
 	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
 	int idx = pmc->idx;
 
-	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
+	__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
 
 	if (kvm_pmu_overflow_status(vcpu)) {
 		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
@@ -316,19 +317,19 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
 	if (val == 0)
 		return;
 
-	enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+	enable = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
 	for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
 		if (!(val & BIT(i)))
 			continue;
-		type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
+		type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
 		       & ARMV8_PMU_EVTYPE_EVENT;
 		if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
 		    && (enable & BIT(i))) {
-			reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
+			reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
 			reg = lower_32_bits(reg);
-			vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
+			__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
 			if (!reg)
-				vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
+				__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
 		}
 	}
 }
@@ -348,7 +349,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 	mask = kvm_pmu_valid_counter_mask(vcpu);
 	if (val & ARMV8_PMU_PMCR_E) {
 		kvm_pmu_enable_counter(vcpu,
-				vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
+		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
 	} else {
 		kvm_pmu_disable_counter(vcpu, mask);
 	}
@@ -369,8 +370,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 
 static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-	return (vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) &&
-	       (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(select_idx));
+	return (__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) &&
+	       (__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(select_idx));
 }
 
 /**
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 25/41] KVM: arm64: Rewrite system register accessors to read/write functions
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <cdall@cs.columbia.edu>

Currently we access the system registers array via the vcpu_sys_reg()
macro.  However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:

 * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
 * Direct array access macro __vcpu_sys_reg()

The first primitive should be used in places where the code needs to
access the currently loaded VCPU's state as observed by the guest.  For
example, when trapping on cache related registers, a write to a system
register should go directly to the VCPU version of the register.

The second primitive can be used in places where the VCPU is known to
never be running (for example userspace access) or for registers which
are never context switched (for example all the PMU system registers).

This rewrites all users of vcpu_sys_regs to one of the two primitives
above.

No functional change.

Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_emulate.h | 13 ++++---
 arch/arm64/include/asm/kvm_host.h    | 13 ++++++-
 arch/arm64/include/asm/kvm_mmu.h     |  2 +-
 arch/arm64/kvm/debug.c               | 27 +++++++++-----
 arch/arm64/kvm/inject_fault.c        |  8 ++--
 arch/arm64/kvm/sys_regs.c            | 71 ++++++++++++++++++------------------
 arch/arm64/kvm/sys_regs.h            |  4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c |  4 +-
 virt/kvm/arm/pmu.c                   | 37 ++++++++++---------
 9 files changed, 102 insertions(+), 77 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e07bf463ac58..df1cb146750d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -273,15 +273,18 @@ static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
 {
-	if (vcpu_mode_is_32bit(vcpu))
+	if (vcpu_mode_is_32bit(vcpu)) {
 		*vcpu_cpsr(vcpu) |= COMPAT_PSR_E_BIT;
-	else
-		vcpu_sys_reg(vcpu, SCTLR_EL1) |= (1 << 25);
+	} else {
+		u64 sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		sctlr |= (1 << 25);
+		vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr);
+	}
 }
 
 static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
@@ -289,7 +292,7 @@ static inline bool kvm_vcpu_is_be(struct kvm_vcpu *vcpu)
 	if (vcpu_mode_is_32bit(vcpu))
 		return !!(*vcpu_cpsr(vcpu) & COMPAT_PSR_E_BIT);
 
-	return !!(vcpu_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
+	return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25));
 }
 
 static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4a1b0bab6baf..91272c35cc36 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -284,7 +284,18 @@ struct kvm_vcpu_arch {
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
-#define vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+
+/*
+ * Only use __vcpu_sys_reg if you know you want the memory backed version of a
+ * register, and not the one most recently accessed by a runnning VCPU.  For
+ * example, for userpace access or for system registers that are never context
+ * switched, but only emulated.
+ */
+#define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+
+#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
+#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
+
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
  * same system registers.
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..c50fcfed0165 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -227,7 +227,7 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index feedb877cff8..db32d10a56a1 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -46,7 +46,8 @@ static DEFINE_PER_CPU(u32, mdcr_el2);
  */
 static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, MDSCR_EL1);
+	vcpu->arch.guest_debug_preserved.mdscr_el1 =
+		vcpu_read_sys_reg(vcpu, MDSCR_EL1);
 
 	trace_kvm_arm_set_dreg32("Saved MDSCR_EL1",
 				vcpu->arch.guest_debug_preserved.mdscr_el1);
@@ -54,10 +55,11 @@ static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
 
 static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
 {
-	vcpu_sys_reg(vcpu, MDSCR_EL1) = vcpu->arch.guest_debug_preserved.mdscr_el1;
+	vcpu_write_sys_reg(vcpu, MDSCR_EL1,
+			   vcpu->arch.guest_debug_preserved.mdscr_el1);
 
 	trace_kvm_arm_set_dreg32("Restored MDSCR_EL1",
-				vcpu_sys_reg(vcpu, MDSCR_EL1));
+				vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
 /**
@@ -108,6 +110,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
 	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
+	unsigned long mdscr;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
 
@@ -152,9 +155,13 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		 */
 		if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) {
 			*vcpu_cpsr(vcpu) |=  DBG_SPSR_SS;
-			vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_SS;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr |= DBG_MDSCR_SS;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 		} else {
-			vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~DBG_MDSCR_SS;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr &= ~DBG_MDSCR_SS;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 		}
 
 		trace_kvm_arm_set_dreg32("SPSR_EL2", *vcpu_cpsr(vcpu));
@@ -170,7 +177,9 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		 */
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) {
 			/* Enable breakpoints/watchpoints */
-			vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_MDE;
+			mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1);
+			mdscr |= DBG_MDSCR_MDE;
+			vcpu_write_sys_reg(vcpu, MDSCR_EL1, mdscr);
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
 			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
@@ -194,12 +203,12 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
-	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
-	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
+	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
 		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
-	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
+	trace_kvm_arm_set_dreg32("MDSCR_EL1",
+				 vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index f4d35bb551e4..1e070943e7a6 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -58,7 +58,7 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
 		exc_offset = LOWER_EL_AArch32_VECTOR;
 	}
 
-	return vcpu_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
+	return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
 }
 
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
@@ -73,7 +73,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
 	*vcpu_spsr(vcpu) = cpsr;
 
-	vcpu_sys_reg(vcpu, FAR_EL1) = addr;
+	vcpu_write_sys_reg(vcpu, FAR_EL1, addr);
 
 	/*
 	 * Build an {i,d}abort, depending on the level and the
@@ -94,7 +94,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
+	vcpu_write_sys_reg(vcpu, ESR_EL1, esr | ESR_ELx_FSC_EXTABT);
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
@@ -115,7 +115,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_trap_il_is32bit(vcpu))
 		esr |= ESR_ELx_IL;
 
-	vcpu_sys_reg(vcpu, ESR_EL1) = esr;
+	vcpu_write_sys_reg(vcpu, ESR_EL1, esr);
 }
 
 /**
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b20c26e0d6b9..96398d53b462 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -133,14 +133,14 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (!p->is_aarch32 || !p->is_32bit) {
 		val = p->regval;
 	} else {
-		val = vcpu_sys_reg(vcpu, reg);
+		val = vcpu_read_sys_reg(vcpu, reg);
 		if (r->reg % 2)
 			val = (p->regval << 32) | (u64)lower_32_bits(val);
 		else
 			val = ((u64)upper_32_bits(val) << 32) |
 				lower_32_bits(p->regval);
 	}
-	vcpu_sys_reg(vcpu, reg) = val;
+	vcpu_write_sys_reg(vcpu, reg, val);
 
 	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
@@ -241,10 +241,10 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    const struct sys_reg_desc *r)
 {
 	if (p->is_write) {
-		vcpu_sys_reg(vcpu, r->reg) = p->regval;
+		vcpu_write_sys_reg(vcpu, r->reg, p->regval);
 		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, r->reg);
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
 	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
@@ -457,7 +457,8 @@ static void reset_wcr(struct kvm_vcpu *vcpu,
 
 static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	vcpu_sys_reg(vcpu, AMAIR_EL1) = read_sysreg(amair_el1);
+	u64 amair = read_sysreg(amair_el1);
+	vcpu_write_sys_reg(vcpu, AMAIR_EL1, amair);
 }
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -474,7 +475,7 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
+	vcpu_write_sys_reg(vcpu, MPIDR_EL1, (1ULL << 31) | mpidr);
 }
 
 static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -488,12 +489,12 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	 */
 	val = ((pmcr & ~ARMV8_PMU_PMCR_MASK)
 	       | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E);
-	vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+	__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
 }
 
 static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
 {
-	u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+	u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
 	bool enabled = (reg & flags) || vcpu_mode_priv(vcpu);
 
 	if (!enabled)
@@ -535,14 +536,14 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (p->is_write) {
 		/* Only update writeable bits of PMCR */
-		val = vcpu_sys_reg(vcpu, PMCR_EL0);
+		val = __vcpu_sys_reg(vcpu, PMCR_EL0);
 		val &= ~ARMV8_PMU_PMCR_MASK;
 		val |= p->regval & ARMV8_PMU_PMCR_MASK;
-		vcpu_sys_reg(vcpu, PMCR_EL0) = val;
+		__vcpu_sys_reg(vcpu, PMCR_EL0) = val;
 		kvm_pmu_handle_pmcr(vcpu, val);
 	} else {
 		/* PMCR.P & PMCR.C are RAZ */
-		val = vcpu_sys_reg(vcpu, PMCR_EL0)
+		val = __vcpu_sys_reg(vcpu, PMCR_EL0)
 		      & ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C);
 		p->regval = val;
 	}
@@ -560,10 +561,10 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		return false;
 
 	if (p->is_write)
-		vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
+		__vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval;
 	else
 		/* return PMSELR.SEL field */
-		p->regval = vcpu_sys_reg(vcpu, PMSELR_EL0)
+		p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0)
 			    & ARMV8_PMU_COUNTER_MASK;
 
 	return true;
@@ -596,7 +597,7 @@ static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
 {
 	u64 pmcr, val;
 
-	pmcr = vcpu_sys_reg(vcpu, PMCR_EL0);
+	pmcr = __vcpu_sys_reg(vcpu, PMCR_EL0);
 	val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK;
 	if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) {
 		kvm_inject_undefined(vcpu);
@@ -621,7 +622,7 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
 			if (pmu_access_event_counter_el0_disabled(vcpu))
 				return false;
 
-			idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
+			idx = __vcpu_sys_reg(vcpu, PMSELR_EL0)
 			      & ARMV8_PMU_COUNTER_MASK;
 		} else if (r->Op2 == 0) {
 			/* PMCCNTR_EL0 */
@@ -676,7 +677,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) {
 		/* PMXEVTYPER_EL0 */
-		idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
+		idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK;
 		reg = PMEVTYPER0_EL0 + idx;
 	} else if (r->CRn == 14 && (r->CRm & 12) == 12) {
 		idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
@@ -694,9 +695,9 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	if (p->is_write) {
 		kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
-		vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
+		__vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
+		p->regval = __vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK;
 	}
 
 	return true;
@@ -718,15 +719,15 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 		val = p->regval & mask;
 		if (r->Op2 & 0x1) {
 			/* accessing PMCNTENSET_EL0 */
-			vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
+			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val;
 			kvm_pmu_enable_counter(vcpu, val);
 		} else {
 			/* accessing PMCNTENCLR_EL0 */
-			vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
+			__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
 			kvm_pmu_disable_counter(vcpu, val);
 		}
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask;
 	}
 
 	return true;
@@ -750,12 +751,12 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 		if (r->Op2 & 0x1)
 			/* accessing PMINTENSET_EL1 */
-			vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
+			__vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val;
 		else
 			/* accessing PMINTENCLR_EL1 */
-			vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
+			__vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask;
 	}
 
 	return true;
@@ -775,12 +776,12 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (p->is_write) {
 		if (r->CRm & 0x2)
 			/* accessing PMOVSSET_EL0 */
-			vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
+			__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= (p->regval & mask);
 		else
 			/* accessing PMOVSCLR_EL0 */
-			vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
+			__vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask);
 	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
+		p->regval = __vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask;
 	}
 
 	return true;
@@ -817,10 +818,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			return false;
 		}
 
-		vcpu_sys_reg(vcpu, PMUSERENR_EL0) = p->regval
-						    & ARMV8_PMU_USERENR_MASK;
-	} else {
-		p->regval = vcpu_sys_reg(vcpu, PMUSERENR_EL0)
+		__vcpu_sys_reg(vcpu, PMUSERENR_EL0) =
+			       p->regval & ARMV8_PMU_USERENR_MASK;
+	} else  {
+		p->regval = __vcpu_sys_reg(vcpu, PMUSERENR_EL0)
 			    & ARMV8_PMU_USERENR_MASK;
 	}
 
@@ -2193,7 +2194,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 	if (r->get_user)
 		return (r->get_user)(vcpu, r, reg, uaddr);
 
-	return reg_to_user(uaddr, &vcpu_sys_reg(vcpu, r->reg), reg->id);
+	return reg_to_user(uaddr, &__vcpu_sys_reg(vcpu, r->reg), reg->id);
 }
 
 int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
@@ -2214,7 +2215,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 	if (r->set_user)
 		return (r->set_user)(vcpu, r, reg, uaddr);
 
-	return reg_from_user(&vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
+	return reg_from_user(&__vcpu_sys_reg(vcpu, r->reg), uaddr, reg->id);
 }
 
 static unsigned int num_demux_regs(void)
@@ -2420,6 +2421,6 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 	reset_sys_reg_descs(vcpu, table, num);
 
 	for (num = 1; num < NR_SYS_REGS; num++)
-		if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
-			panic("Didn't reset vcpu_sys_reg(%zi)", num);
+		if (__vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
+			panic("Didn't reset __vcpu_sys_reg(%zi)", num);
 }
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 060f5348ef25..cd710f8b63e0 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -89,14 +89,14 @@ static inline void reset_unknown(struct kvm_vcpu *vcpu,
 {
 	BUG_ON(!r->reg);
 	BUG_ON(r->reg >= NR_SYS_REGS);
-	vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
+	__vcpu_sys_reg(vcpu, r->reg) = 0x1de7ec7edbadc0deULL;
 }
 
 static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
 	BUG_ON(!r->reg);
 	BUG_ON(r->reg >= NR_SYS_REGS);
-	vcpu_sys_reg(vcpu, r->reg) = r->val;
+	__vcpu_sys_reg(vcpu, r->reg) = r->val;
 }
 
 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c b/arch/arm64/kvm/sys_regs_generic_v8.c
index 969ade1d333d..ddb8497d18d6 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -38,13 +38,13 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu_sys_reg(vcpu, ACTLR_EL1);
+	p->regval = vcpu_read_sys_reg(vcpu, ACTLR_EL1);
 	return true;
 }
 
 static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
+	__vcpu_sys_reg(vcpu, ACTLR_EL1) = read_sysreg(actlr_el1);
 }
 
 /*
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 8a9c42366db7..29cb4a1ff26b 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -37,7 +37,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
 
 	reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
 	      ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
-	counter = vcpu_sys_reg(vcpu, reg);
+	counter = __vcpu_sys_reg(vcpu, reg);
 
 	/* The real counter value is equal to the value of counter register plus
 	 * the value perf event counts.
@@ -61,7 +61,8 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val)
 
 	reg = (select_idx == ARMV8_PMU_CYCLE_IDX)
 	      ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
-	vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx);
+	__vcpu_sys_reg(vcpu, reg) +=
+		(s64)val - kvm_pmu_get_counter_value(vcpu, select_idx);
 }
 
 /**
@@ -78,7 +79,7 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
 		counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
 		reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX)
 		       ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx;
-		vcpu_sys_reg(vcpu, reg) = counter;
+		__vcpu_sys_reg(vcpu, reg) = counter;
 		perf_event_disable(pmc->perf_event);
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
@@ -125,7 +126,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 {
-	u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMU_PMCR_N_SHIFT;
+	u64 val = __vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMU_PMCR_N_SHIFT;
 
 	val &= ARMV8_PMU_PMCR_N_MASK;
 	if (val == 0)
@@ -147,7 +148,7 @@ void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val)
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
 	struct kvm_pmc *pmc;
 
-	if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val)
+	if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val)
 		return;
 
 	for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) {
@@ -193,10 +194,10 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 {
 	u64 reg = 0;
 
-	if ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) {
-		reg = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
-		reg &= vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
-		reg &= vcpu_sys_reg(vcpu, PMINTENSET_EL1);
+	if ((__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) {
+		reg = __vcpu_sys_reg(vcpu, PMOVSSET_EL0);
+		reg &= __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+		reg &= __vcpu_sys_reg(vcpu, PMINTENSET_EL1);
 		reg &= kvm_pmu_valid_counter_mask(vcpu);
 	}
 
@@ -295,7 +296,7 @@ static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
 	struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
 	int idx = pmc->idx;
 
-	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
+	__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(idx);
 
 	if (kvm_pmu_overflow_status(vcpu)) {
 		kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
@@ -316,19 +317,19 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
 	if (val == 0)
 		return;
 
-	enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+	enable = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
 	for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) {
 		if (!(val & BIT(i)))
 			continue;
-		type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
+		type = __vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
 		       & ARMV8_PMU_EVTYPE_EVENT;
 		if ((type == ARMV8_PMUV3_PERFCTR_SW_INCR)
 		    && (enable & BIT(i))) {
-			reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
+			reg = __vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
 			reg = lower_32_bits(reg);
-			vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
+			__vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
 			if (!reg)
-				vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
+				__vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= BIT(i);
 		}
 	}
 }
@@ -348,7 +349,7 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 	mask = kvm_pmu_valid_counter_mask(vcpu);
 	if (val & ARMV8_PMU_PMCR_E) {
 		kvm_pmu_enable_counter(vcpu,
-				vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
+		       __vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
 	} else {
 		kvm_pmu_disable_counter(vcpu, mask);
 	}
@@ -369,8 +370,8 @@ void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
 
 static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
 {
-	return (vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) &&
-	       (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(select_idx));
+	return (__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) &&
+	       (__vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(select_idx));
 }
 
 /**
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

We are about to defer saving and restoring some groups of system
registers to vcpu_put and vcpu_load on supported systems.  This means
that we need some infrastructure to access system registes which
supports either accessing the memory backing of the register or directly
accessing the system registers, depending on the state of the system
when we access the register.

We do this by defining a set of read/write accessors for each system
register, and letting each system register be defined as "immediate" or
"deferrable".  Immediate registers are always saved/restored in the
world-switch path, but deferrable registers are only saved/restored in
vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
in that case.

Not that we don't use the deferred mechanism yet in this patch, but only
introduce infrastructure.  This is to improve convenience of review in
the subsequent patches where it is clear which registers become
deferred.

 [ Most of this logic was contributed by Marc Zyngier ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |   8 +-
 arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 91272c35cc36..4b5ef82f6bdb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
 
 	/* Detect first run of a vcpu */
 	bool has_run_once;
+
+	/* True when deferrable sysregs are loaded on the physical CPU,
+	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
+	bool sysregs_loaded_on_cpu;
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
@@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
  */
 #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
 
-#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
-#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
 
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 96398d53b462..9d353a6a55c9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -35,6 +35,7 @@
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
@@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+struct sys_reg_accessor {
+	u64	(*rdsr)(struct kvm_vcpu *, int);
+	void	(*wrsr)(struct kvm_vcpu *, int, u64);
+};
+
+#define DECLARE_IMMEDIATE_SR(i)						\
+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
+	{								\
+		return __vcpu_sys_reg(vcpu, r);				\
+	}								\
+									\
+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
+	{								\
+		__vcpu_sys_reg(vcpu, r) = v;				\
+	}								\
+
+#define DECLARE_DEFERRABLE_SR(i, s)					\
+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
+	{								\
+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
+			return read_sysreg_s((s));			\
+		}							\
+		return __vcpu_sys_reg(vcpu, r);				\
+	}								\
+									\
+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
+	{								\
+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
+			write_sysreg_s(v, (s));				\
+		} else {						\
+			__vcpu_sys_reg(vcpu, r) = v;			\
+		}							\
+	}								\
+
+
+#define SR_HANDLER_RANGE(i,e)						\
+	[i ... e] =  (struct sys_reg_accessor) {			\
+		.rdsr = __##i##_read,					\
+		.wrsr = __##i##_write,					\
+	}
+
+#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
+
+static void bad_sys_reg(int reg)
+{
+	WARN_ONCE(1, "Bad system register access %d\n", reg);
+}
+
+static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
+{
+	bad_sys_reg(reg);
+	return 0;
+}
+
+static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	bad_sys_reg(reg);
+}
+
+/* Ordered as in enum vcpu_sysreg */
+DECLARE_IMMEDIATE_SR(MPIDR_EL1);
+DECLARE_IMMEDIATE_SR(CSSELR_EL1);
+DECLARE_IMMEDIATE_SR(SCTLR_EL1);
+DECLARE_IMMEDIATE_SR(ACTLR_EL1);
+DECLARE_IMMEDIATE_SR(CPACR_EL1);
+DECLARE_IMMEDIATE_SR(TTBR0_EL1);
+DECLARE_IMMEDIATE_SR(TTBR1_EL1);
+DECLARE_IMMEDIATE_SR(TCR_EL1);
+DECLARE_IMMEDIATE_SR(ESR_EL1);
+DECLARE_IMMEDIATE_SR(AFSR0_EL1);
+DECLARE_IMMEDIATE_SR(AFSR1_EL1);
+DECLARE_IMMEDIATE_SR(FAR_EL1);
+DECLARE_IMMEDIATE_SR(MAIR_EL1);
+DECLARE_IMMEDIATE_SR(VBAR_EL1);
+DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
+DECLARE_IMMEDIATE_SR(TPIDR_EL0);
+DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
+DECLARE_IMMEDIATE_SR(TPIDR_EL1);
+DECLARE_IMMEDIATE_SR(AMAIR_EL1);
+DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
+DECLARE_IMMEDIATE_SR(PAR_EL1);
+DECLARE_IMMEDIATE_SR(MDSCR_EL1);
+DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
+DECLARE_IMMEDIATE_SR(PMCR_EL0);
+DECLARE_IMMEDIATE_SR(PMSELR_EL0);
+DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
+/* PMEVCNTR30_EL0 */
+DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
+DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
+/* PMEVTYPER30_EL0 */
+DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
+DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
+DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
+DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
+DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
+DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
+DECLARE_IMMEDIATE_SR(DACR32_EL2);
+DECLARE_IMMEDIATE_SR(IFSR32_EL2);
+DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
+DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
+
+static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
+	[0 ... NR_SYS_REGS - 1] = {
+		.rdsr = __default_read_sys_reg,
+		.wrsr = __default_write_sys_reg,
+	},
+
+	SR_HANDLER(MPIDR_EL1),
+	SR_HANDLER(CSSELR_EL1),
+	SR_HANDLER(SCTLR_EL1),
+	SR_HANDLER(ACTLR_EL1),
+	SR_HANDLER(CPACR_EL1),
+	SR_HANDLER(TTBR0_EL1),
+	SR_HANDLER(TTBR1_EL1),
+	SR_HANDLER(TCR_EL1),
+	SR_HANDLER(ESR_EL1),
+	SR_HANDLER(AFSR0_EL1),
+	SR_HANDLER(AFSR1_EL1),
+	SR_HANDLER(FAR_EL1),
+	SR_HANDLER(MAIR_EL1),
+	SR_HANDLER(VBAR_EL1),
+	SR_HANDLER(CONTEXTIDR_EL1),
+	SR_HANDLER(TPIDR_EL0),
+	SR_HANDLER(TPIDRRO_EL0),
+	SR_HANDLER(TPIDR_EL1),
+	SR_HANDLER(AMAIR_EL1),
+	SR_HANDLER(CNTKCTL_EL1),
+	SR_HANDLER(PAR_EL1),
+	SR_HANDLER(MDSCR_EL1),
+	SR_HANDLER(MDCCINT_EL1),
+	SR_HANDLER(PMCR_EL0),
+	SR_HANDLER(PMSELR_EL0),
+	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
+	SR_HANDLER(PMCCNTR_EL0),
+	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
+	SR_HANDLER(PMCCFILTR_EL0),
+	SR_HANDLER(PMCNTENSET_EL0),
+	SR_HANDLER(PMINTENSET_EL1),
+	SR_HANDLER(PMOVSSET_EL0),
+	SR_HANDLER(PMSWINC_EL0),
+	SR_HANDLER(PMUSERENR_EL0),
+	SR_HANDLER(DACR32_EL2),
+	SR_HANDLER(IFSR32_EL2),
+	SR_HANDLER(FPEXC32_EL2),
+	SR_HANDLER(DBGVCR32_EL2),
+};
+
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
+{
+	return sys_reg_accessors[reg].rdsr(vcpu, reg);
+}
+
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
+}
+
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
 static u32 cache_levels;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to defer saving and restoring some groups of system
registers to vcpu_put and vcpu_load on supported systems.  This means
that we need some infrastructure to access system registes which
supports either accessing the memory backing of the register or directly
accessing the system registers, depending on the state of the system
when we access the register.

We do this by defining a set of read/write accessors for each system
register, and letting each system register be defined as "immediate" or
"deferrable".  Immediate registers are always saved/restored in the
world-switch path, but deferrable registers are only saved/restored in
vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
in that case.

Not that we don't use the deferred mechanism yet in this patch, but only
introduce infrastructure.  This is to improve convenience of review in
the subsequent patches where it is clear which registers become
deferred.

 [ Most of this logic was contributed by Marc Zyngier ]

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h |   8 +-
 arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 91272c35cc36..4b5ef82f6bdb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
 
 	/* Detect first run of a vcpu */
 	bool has_run_once;
+
+	/* True when deferrable sysregs are loaded on the physical CPU,
+	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
+	bool sysregs_loaded_on_cpu;
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
@@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
  */
 #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
 
-#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
-#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
 
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 96398d53b462..9d353a6a55c9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -35,6 +35,7 @@
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
@@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+struct sys_reg_accessor {
+	u64	(*rdsr)(struct kvm_vcpu *, int);
+	void	(*wrsr)(struct kvm_vcpu *, int, u64);
+};
+
+#define DECLARE_IMMEDIATE_SR(i)						\
+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
+	{								\
+		return __vcpu_sys_reg(vcpu, r);				\
+	}								\
+									\
+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
+	{								\
+		__vcpu_sys_reg(vcpu, r) = v;				\
+	}								\
+
+#define DECLARE_DEFERRABLE_SR(i, s)					\
+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
+	{								\
+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
+			return read_sysreg_s((s));			\
+		}							\
+		return __vcpu_sys_reg(vcpu, r);				\
+	}								\
+									\
+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
+	{								\
+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
+			write_sysreg_s(v, (s));				\
+		} else {						\
+			__vcpu_sys_reg(vcpu, r) = v;			\
+		}							\
+	}								\
+
+
+#define SR_HANDLER_RANGE(i,e)						\
+	[i ... e] =  (struct sys_reg_accessor) {			\
+		.rdsr = __##i##_read,					\
+		.wrsr = __##i##_write,					\
+	}
+
+#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
+
+static void bad_sys_reg(int reg)
+{
+	WARN_ONCE(1, "Bad system register access %d\n", reg);
+}
+
+static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
+{
+	bad_sys_reg(reg);
+	return 0;
+}
+
+static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	bad_sys_reg(reg);
+}
+
+/* Ordered as in enum vcpu_sysreg */
+DECLARE_IMMEDIATE_SR(MPIDR_EL1);
+DECLARE_IMMEDIATE_SR(CSSELR_EL1);
+DECLARE_IMMEDIATE_SR(SCTLR_EL1);
+DECLARE_IMMEDIATE_SR(ACTLR_EL1);
+DECLARE_IMMEDIATE_SR(CPACR_EL1);
+DECLARE_IMMEDIATE_SR(TTBR0_EL1);
+DECLARE_IMMEDIATE_SR(TTBR1_EL1);
+DECLARE_IMMEDIATE_SR(TCR_EL1);
+DECLARE_IMMEDIATE_SR(ESR_EL1);
+DECLARE_IMMEDIATE_SR(AFSR0_EL1);
+DECLARE_IMMEDIATE_SR(AFSR1_EL1);
+DECLARE_IMMEDIATE_SR(FAR_EL1);
+DECLARE_IMMEDIATE_SR(MAIR_EL1);
+DECLARE_IMMEDIATE_SR(VBAR_EL1);
+DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
+DECLARE_IMMEDIATE_SR(TPIDR_EL0);
+DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
+DECLARE_IMMEDIATE_SR(TPIDR_EL1);
+DECLARE_IMMEDIATE_SR(AMAIR_EL1);
+DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
+DECLARE_IMMEDIATE_SR(PAR_EL1);
+DECLARE_IMMEDIATE_SR(MDSCR_EL1);
+DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
+DECLARE_IMMEDIATE_SR(PMCR_EL0);
+DECLARE_IMMEDIATE_SR(PMSELR_EL0);
+DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
+/* PMEVCNTR30_EL0 */
+DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
+DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
+/* PMEVTYPER30_EL0 */
+DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
+DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
+DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
+DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
+DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
+DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
+DECLARE_IMMEDIATE_SR(DACR32_EL2);
+DECLARE_IMMEDIATE_SR(IFSR32_EL2);
+DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
+DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
+
+static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
+	[0 ... NR_SYS_REGS - 1] = {
+		.rdsr = __default_read_sys_reg,
+		.wrsr = __default_write_sys_reg,
+	},
+
+	SR_HANDLER(MPIDR_EL1),
+	SR_HANDLER(CSSELR_EL1),
+	SR_HANDLER(SCTLR_EL1),
+	SR_HANDLER(ACTLR_EL1),
+	SR_HANDLER(CPACR_EL1),
+	SR_HANDLER(TTBR0_EL1),
+	SR_HANDLER(TTBR1_EL1),
+	SR_HANDLER(TCR_EL1),
+	SR_HANDLER(ESR_EL1),
+	SR_HANDLER(AFSR0_EL1),
+	SR_HANDLER(AFSR1_EL1),
+	SR_HANDLER(FAR_EL1),
+	SR_HANDLER(MAIR_EL1),
+	SR_HANDLER(VBAR_EL1),
+	SR_HANDLER(CONTEXTIDR_EL1),
+	SR_HANDLER(TPIDR_EL0),
+	SR_HANDLER(TPIDRRO_EL0),
+	SR_HANDLER(TPIDR_EL1),
+	SR_HANDLER(AMAIR_EL1),
+	SR_HANDLER(CNTKCTL_EL1),
+	SR_HANDLER(PAR_EL1),
+	SR_HANDLER(MDSCR_EL1),
+	SR_HANDLER(MDCCINT_EL1),
+	SR_HANDLER(PMCR_EL0),
+	SR_HANDLER(PMSELR_EL0),
+	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
+	SR_HANDLER(PMCCNTR_EL0),
+	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
+	SR_HANDLER(PMCCFILTR_EL0),
+	SR_HANDLER(PMCNTENSET_EL0),
+	SR_HANDLER(PMINTENSET_EL1),
+	SR_HANDLER(PMOVSSET_EL0),
+	SR_HANDLER(PMSWINC_EL0),
+	SR_HANDLER(PMUSERENR_EL0),
+	SR_HANDLER(DACR32_EL2),
+	SR_HANDLER(IFSR32_EL2),
+	SR_HANDLER(FPEXC32_EL2),
+	SR_HANDLER(DBGVCR32_EL2),
+};
+
+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
+{
+	return sys_reg_accessors[reg].rdsr(vcpu, reg);
+}
+
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
+}
+
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
 static u32 cache_levels;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 27/41] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

SPSR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

The handling of accessing the various banked SPSRs for 32-bit VMs is a
bit clunky, but this will be improved in following patches which will
first prepare and subsequently implement deferred save/restore of the
32-bit registers, including the 32-bit SPSRs.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_emulate.h   | 12 ++++++++++-
 arch/arm/kvm/emulate.c               |  2 +-
 arch/arm64/include/asm/kvm_emulate.h | 41 +++++++++++++++++++++++++++++++-----
 arch/arm64/kvm/inject_fault.c        |  4 ++--
 virt/kvm/arm/aarch32.c               |  2 +-
 5 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index d5e1b8bf6422..db8a09e9a16f 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu *vcpu, u8 reg_num)
 	return vcpu_reg(vcpu, reg_num);
 }
 
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);
+
+static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
+{
+	return *__vcpu_spsr(vcpu);
+}
+
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
+{
+	*__vcpu_spsr(vcpu) = v;
+}
 
 static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
 					 u8 reg_num)
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index fa501bf437f3..9046b53d87c1 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -142,7 +142,7 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu)
 {
 	unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK;
 	switch (mode) {
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index df1cb146750d..f0bc7c096fdc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -26,6 +26,7 @@
 
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmio.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
@@ -131,13 +132,43 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
 }
 
-/* Get vcpu SPSR for current mode */
-static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
+static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-	if (vcpu_mode_is_32bit(vcpu))
-		return vcpu_spsr32(vcpu);
+	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+	if (vcpu_mode_is_32bit(vcpu)) {
+		unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+		if (p_32bit != (unsigned long *)p)
+			return *p_32bit;
+	}
+
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		return read_sysreg_el1(spsr);
+	else
+		return *p;
+}
 
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long v)
+{
+	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+	/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+	if (vcpu_mode_is_32bit(vcpu)) {
+		unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+		if (p_32bit != (unsigned long *)p) {
+			*p_32bit = v;
+			return;
+		}
+	}
+
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		write_sysreg_el1(v, spsr);
+	else
+		*p = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 1e070943e7a6..c638593d305d 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -71,7 +71,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-	*vcpu_spsr(vcpu) = cpsr;
+	vcpu_write_spsr(vcpu, cpsr);
 
 	vcpu_write_sys_reg(vcpu, FAR_EL1, addr);
 
@@ -106,7 +106,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-	*vcpu_spsr(vcpu) = cpsr;
+	vcpu_write_spsr(vcpu, cpsr);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index 8bc479fa37e6..efc84cbe8277 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -178,7 +178,7 @@ static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 	*vcpu_cpsr(vcpu) = cpsr;
 
 	/* Note: These now point to the banked copies */
-	*vcpu_spsr(vcpu) = new_spsr_value;
+	vcpu_write_spsr(vcpu, new_spsr_value);
 	*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
 
 	/* Branch to exception vector */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 27/41] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

SPSR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

The handling of accessing the various banked SPSRs for 32-bit VMs is a
bit clunky, but this will be improved in following patches which will
first prepare and subsequently implement deferred save/restore of the
32-bit registers, including the 32-bit SPSRs.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_emulate.h   | 12 ++++++++++-
 arch/arm/kvm/emulate.c               |  2 +-
 arch/arm64/include/asm/kvm_emulate.h | 41 +++++++++++++++++++++++++++++++-----
 arch/arm64/kvm/inject_fault.c        |  4 ++--
 virt/kvm/arm/aarch32.c               |  2 +-
 5 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index d5e1b8bf6422..db8a09e9a16f 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -41,7 +41,17 @@ static inline unsigned long *vcpu_reg32(struct kvm_vcpu *vcpu, u8 reg_num)
 	return vcpu_reg(vcpu, reg_num);
 }
 
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu);
+
+static inline unsigned long vpcu_read_spsr(struct kvm_vcpu *vcpu)
+{
+	return *__vcpu_spsr(vcpu);
+}
+
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
+{
+	*__vcpu_spsr(vcpu) = v;
+}
 
 static inline unsigned long vcpu_get_reg(struct kvm_vcpu *vcpu,
 					 u8 reg_num)
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index fa501bf437f3..9046b53d87c1 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -142,7 +142,7 @@ unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu)
+unsigned long *__vcpu_spsr(struct kvm_vcpu *vcpu)
 {
 	unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK;
 	switch (mode) {
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index df1cb146750d..f0bc7c096fdc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -26,6 +26,7 @@
 
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmio.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
@@ -131,13 +132,43 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
 }
 
-/* Get vcpu SPSR for current mode */
-static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
+static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-	if (vcpu_mode_is_32bit(vcpu))
-		return vcpu_spsr32(vcpu);
+	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+	if (vcpu_mode_is_32bit(vcpu)) {
+		unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+		if (p_32bit != (unsigned long *)p)
+			return *p_32bit;
+	}
+
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		return read_sysreg_el1(spsr);
+	else
+		return *p;
+}
 
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long v)
+{
+	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+
+	/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+	if (vcpu_mode_is_32bit(vcpu)) {
+		unsigned long *p_32bit = vcpu_spsr32(vcpu);
+
+		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
+		if (p_32bit != (unsigned long *)p) {
+			*p_32bit = v;
+			return;
+		}
+	}
+
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		write_sysreg_el1(v, spsr);
+	else
+		*p = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 1e070943e7a6..c638593d305d 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -71,7 +71,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-	*vcpu_spsr(vcpu) = cpsr;
+	vcpu_write_spsr(vcpu, cpsr);
 
 	vcpu_write_sys_reg(vcpu, FAR_EL1, addr);
 
@@ -106,7 +106,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-	*vcpu_spsr(vcpu) = cpsr;
+	vcpu_write_spsr(vcpu, cpsr);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index 8bc479fa37e6..efc84cbe8277 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -178,7 +178,7 @@ static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 	*vcpu_cpsr(vcpu) = cpsr;
 
 	/* Note: These now point to the banked copies */
-	*vcpu_spsr(vcpu) = new_spsr_value;
+	vcpu_write_spsr(vcpu, new_spsr_value);
 	*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
 
 	/* Branch to exception vector */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 28/41] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

ELR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 18 +++++++++++++++++-
 arch/arm64/kvm/inject_fault.c        |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f0bc7c096fdc..c9ca2dc579c7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -78,11 +78,27 @@ static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
 }
 
-static inline unsigned long *vcpu_elr_el1(const struct kvm_vcpu *vcpu)
+static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->elr_el1;
 }
 
+static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		return read_sysreg_el1(elr);
+	else
+		return *__vcpu_elr_el1(vcpu);
+}
+
+static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
+{
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		write_sysreg_el1(v, elr);
+	else
+		*__vcpu_elr_el1(vcpu) = v;
+}
+
 static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pstate;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index c638593d305d..8425e20f1cc9 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -67,7 +67,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
@@ -102,7 +102,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 28/41] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

ELR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 18 +++++++++++++++++-
 arch/arm64/kvm/inject_fault.c        |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f0bc7c096fdc..c9ca2dc579c7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -78,11 +78,27 @@ static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
 }
 
-static inline unsigned long *vcpu_elr_el1(const struct kvm_vcpu *vcpu)
+static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->elr_el1;
 }
 
+static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		return read_sysreg_el1(elr);
+	else
+		return *__vcpu_elr_el1(vcpu);
+}
+
+static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
+{
+	if (vcpu->arch.sysregs_loaded_on_cpu)
+		write_sysreg_el1(v, elr);
+	else
+		*__vcpu_elr_el1(vcpu) = v;
+}
+
 static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pstate;
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index c638593d305d..8425e20f1cc9 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -67,7 +67,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
@@ -102,7 +102,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 29/41] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

Some system registers do not affect the host kernel's execution and can
therefore be loaded when we are about to run a VCPU and we don't have to
restore the host state to the hardware before the time when we are
actually about to return to userspace or schedule out the VCPU thread.

The EL1 system registers and the userspace state registers only
affecting EL0 execution do not need to be saved and restored on every
switch between the VM and the host, because they don't affect the host
kernel's execution.

We mark all registers which are now deffered as such in the
declarations in sys-regs.c to ensure the most up-to-date copy is always
accessed.

Note MPIDR_EL1 (controlled via VMPIDR_EL2) is accessed from other vcpu
threads, for example via the GIC emulation, and therefore must be
declared as immediate, which is fine as the guest cannot modify this
value.

The 32-bit sysregs can also be deferred but we do this in a separate
patch as it requires a bit more infrastructure.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 37 +++++++++++++++++++++++++++++--------
 arch/arm64/kvm/sys_regs.c      | 40 ++++++++++++++++++++--------------------
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 1f2d5e9343b0..eabd35154232 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -25,8 +25,12 @@
 /*
  * Non-VHE: Both host and guest must save everything.
  *
- * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0,
- * and guest must save everything.
+ * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
+ * which are handled as part of the el2 return state) on every switch.
+ * tpidr_el0, tpidrro_el0, and actlr_el1 only need to be switched when going
+ * to host userspace or a different VCPU.  EL1 registers only need to be
+ * switched when potentially going to run a different VCPU.  The latter two
+ * classes are handled as part of kvm_arch_vcpu_load and kvm_arch_vcpu_put.
  */
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
@@ -90,14 +94,11 @@ void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
 }
 
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
 	__sysreg_save_el2_return_state(ctxt);
 }
 
@@ -163,14 +164,11 @@ void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
 }
 
 void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
 }
 
@@ -236,6 +234,18 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
+
+	if (!has_vhe())
+		return;
+
+	__sysreg_save_user_state(host_ctxt);
+
+	__sysreg_restore_user_state(guest_ctxt);
+	__sysreg_restore_el1_state(guest_ctxt);
+
+	vcpu->arch.sysregs_loaded_on_cpu = true;
 }
 
 /**
@@ -264,6 +274,17 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
 		vcpu->arch.guest_vfp_loaded = 0;
 	}
+
+	if (!has_vhe())
+		return;
+
+	__sysreg_save_el1_state(guest_ctxt);
+	__sysreg_save_user_state(guest_ctxt);
+
+	/* Restore host user state */
+	__sysreg_restore_user_state(host_ctxt);
+
+	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d353a6a55c9..8df651a8a36c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -140,26 +140,26 @@ static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
 
 /* Ordered as in enum vcpu_sysreg */
 DECLARE_IMMEDIATE_SR(MPIDR_EL1);
-DECLARE_IMMEDIATE_SR(CSSELR_EL1);
-DECLARE_IMMEDIATE_SR(SCTLR_EL1);
-DECLARE_IMMEDIATE_SR(ACTLR_EL1);
-DECLARE_IMMEDIATE_SR(CPACR_EL1);
-DECLARE_IMMEDIATE_SR(TTBR0_EL1);
-DECLARE_IMMEDIATE_SR(TTBR1_EL1);
-DECLARE_IMMEDIATE_SR(TCR_EL1);
-DECLARE_IMMEDIATE_SR(ESR_EL1);
-DECLARE_IMMEDIATE_SR(AFSR0_EL1);
-DECLARE_IMMEDIATE_SR(AFSR1_EL1);
-DECLARE_IMMEDIATE_SR(FAR_EL1);
-DECLARE_IMMEDIATE_SR(MAIR_EL1);
-DECLARE_IMMEDIATE_SR(VBAR_EL1);
-DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
-DECLARE_IMMEDIATE_SR(TPIDR_EL0);
-DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
-DECLARE_IMMEDIATE_SR(TPIDR_EL1);
-DECLARE_IMMEDIATE_SR(AMAIR_EL1);
-DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
-DECLARE_IMMEDIATE_SR(PAR_EL1);
+DECLARE_DEFERRABLE_SR(CSSELR_EL1,	SYS_CSSELR_EL1);
+DECLARE_DEFERRABLE_SR(SCTLR_EL1,	sctlr_EL12);
+DECLARE_DEFERRABLE_SR(ACTLR_EL1,	SYS_ACTLR_EL1);
+DECLARE_DEFERRABLE_SR(CPACR_EL1,	cpacr_EL12);
+DECLARE_DEFERRABLE_SR(TTBR0_EL1,	ttbr0_EL12);
+DECLARE_DEFERRABLE_SR(TTBR1_EL1,	ttbr1_EL12);
+DECLARE_DEFERRABLE_SR(TCR_EL1,		tcr_EL12);
+DECLARE_DEFERRABLE_SR(ESR_EL1,		esr_EL12);
+DECLARE_DEFERRABLE_SR(AFSR0_EL1,	afsr0_EL12);
+DECLARE_DEFERRABLE_SR(AFSR1_EL1,	afsr1_EL12);
+DECLARE_DEFERRABLE_SR(FAR_EL1,		far_EL12);
+DECLARE_DEFERRABLE_SR(MAIR_EL1,		mair_EL12);
+DECLARE_DEFERRABLE_SR(VBAR_EL1,		vbar_EL12);
+DECLARE_DEFERRABLE_SR(CONTEXTIDR_EL1,	contextidr_EL12);
+DECLARE_DEFERRABLE_SR(TPIDR_EL0,	SYS_TPIDR_EL0);
+DECLARE_DEFERRABLE_SR(TPIDRRO_EL0,	SYS_TPIDRRO_EL0);
+DECLARE_DEFERRABLE_SR(TPIDR_EL1,	SYS_TPIDR_EL1);
+DECLARE_DEFERRABLE_SR(AMAIR_EL1,	amair_EL12);
+DECLARE_DEFERRABLE_SR(CNTKCTL_EL1,	cntkctl_EL12);
+DECLARE_DEFERRABLE_SR(PAR_EL1,		SYS_PAR_EL1);
 DECLARE_IMMEDIATE_SR(MDSCR_EL1);
 DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
 DECLARE_IMMEDIATE_SR(PMCR_EL0);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 29/41] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Some system registers do not affect the host kernel's execution and can
therefore be loaded when we are about to run a VCPU and we don't have to
restore the host state to the hardware before the time when we are
actually about to return to userspace or schedule out the VCPU thread.

The EL1 system registers and the userspace state registers only
affecting EL0 execution do not need to be saved and restored on every
switch between the VM and the host, because they don't affect the host
kernel's execution.

We mark all registers which are now deffered as such in the
declarations in sys-regs.c to ensure the most up-to-date copy is always
accessed.

Note MPIDR_EL1 (controlled via VMPIDR_EL2) is accessed from other vcpu
threads, for example via the GIC emulation, and therefore must be
declared as immediate, which is fine as the guest cannot modify this
value.

The 32-bit sysregs can also be deferred but we do this in a separate
patch as it requires a bit more infrastructure.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 37 +++++++++++++++++++++++++++++--------
 arch/arm64/kvm/sys_regs.c      | 40 ++++++++++++++++++++--------------------
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 1f2d5e9343b0..eabd35154232 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -25,8 +25,12 @@
 /*
  * Non-VHE: Both host and guest must save everything.
  *
- * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0,
- * and guest must save everything.
+ * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate,
+ * which are handled as part of the el2 return state) on every switch.
+ * tpidr_el0, tpidrro_el0, and actlr_el1 only need to be switched when going
+ * to host userspace or a different VCPU.  EL1 registers only need to be
+ * switched when potentially going to run a different VCPU.  The latter two
+ * classes are handled as part of kvm_arch_vcpu_load and kvm_arch_vcpu_put.
  */
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
@@ -90,14 +94,11 @@ void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 void sysreg_save_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
 }
 
 void sysreg_save_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_save_el1_state(ctxt);
 	__sysreg_save_common_state(ctxt);
-	__sysreg_save_user_state(ctxt);
 	__sysreg_save_el2_return_state(ctxt);
 }
 
@@ -163,14 +164,11 @@ void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 void sysreg_restore_host_state_vhe(struct kvm_cpu_context *ctxt)
 {
 	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
 }
 
 void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
 	__sysreg_restore_common_state(ctxt);
-	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
 }
 
@@ -236,6 +234,18 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
  */
 void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
+	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
+
+	if (!has_vhe())
+		return;
+
+	__sysreg_save_user_state(host_ctxt);
+
+	__sysreg_restore_user_state(guest_ctxt);
+	__sysreg_restore_el1_state(guest_ctxt);
+
+	vcpu->arch.sysregs_loaded_on_cpu = true;
 }
 
 /**
@@ -264,6 +274,17 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
 		vcpu->arch.guest_vfp_loaded = 0;
 	}
+
+	if (!has_vhe())
+		return;
+
+	__sysreg_save_el1_state(guest_ctxt);
+	__sysreg_save_user_state(guest_ctxt);
+
+	/* Restore host user state */
+	__sysreg_restore_user_state(host_ctxt);
+
+	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
 
 void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d353a6a55c9..8df651a8a36c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -140,26 +140,26 @@ static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
 
 /* Ordered as in enum vcpu_sysreg */
 DECLARE_IMMEDIATE_SR(MPIDR_EL1);
-DECLARE_IMMEDIATE_SR(CSSELR_EL1);
-DECLARE_IMMEDIATE_SR(SCTLR_EL1);
-DECLARE_IMMEDIATE_SR(ACTLR_EL1);
-DECLARE_IMMEDIATE_SR(CPACR_EL1);
-DECLARE_IMMEDIATE_SR(TTBR0_EL1);
-DECLARE_IMMEDIATE_SR(TTBR1_EL1);
-DECLARE_IMMEDIATE_SR(TCR_EL1);
-DECLARE_IMMEDIATE_SR(ESR_EL1);
-DECLARE_IMMEDIATE_SR(AFSR0_EL1);
-DECLARE_IMMEDIATE_SR(AFSR1_EL1);
-DECLARE_IMMEDIATE_SR(FAR_EL1);
-DECLARE_IMMEDIATE_SR(MAIR_EL1);
-DECLARE_IMMEDIATE_SR(VBAR_EL1);
-DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
-DECLARE_IMMEDIATE_SR(TPIDR_EL0);
-DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
-DECLARE_IMMEDIATE_SR(TPIDR_EL1);
-DECLARE_IMMEDIATE_SR(AMAIR_EL1);
-DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
-DECLARE_IMMEDIATE_SR(PAR_EL1);
+DECLARE_DEFERRABLE_SR(CSSELR_EL1,	SYS_CSSELR_EL1);
+DECLARE_DEFERRABLE_SR(SCTLR_EL1,	sctlr_EL12);
+DECLARE_DEFERRABLE_SR(ACTLR_EL1,	SYS_ACTLR_EL1);
+DECLARE_DEFERRABLE_SR(CPACR_EL1,	cpacr_EL12);
+DECLARE_DEFERRABLE_SR(TTBR0_EL1,	ttbr0_EL12);
+DECLARE_DEFERRABLE_SR(TTBR1_EL1,	ttbr1_EL12);
+DECLARE_DEFERRABLE_SR(TCR_EL1,		tcr_EL12);
+DECLARE_DEFERRABLE_SR(ESR_EL1,		esr_EL12);
+DECLARE_DEFERRABLE_SR(AFSR0_EL1,	afsr0_EL12);
+DECLARE_DEFERRABLE_SR(AFSR1_EL1,	afsr1_EL12);
+DECLARE_DEFERRABLE_SR(FAR_EL1,		far_EL12);
+DECLARE_DEFERRABLE_SR(MAIR_EL1,		mair_EL12);
+DECLARE_DEFERRABLE_SR(VBAR_EL1,		vbar_EL12);
+DECLARE_DEFERRABLE_SR(CONTEXTIDR_EL1,	contextidr_EL12);
+DECLARE_DEFERRABLE_SR(TPIDR_EL0,	SYS_TPIDR_EL0);
+DECLARE_DEFERRABLE_SR(TPIDRRO_EL0,	SYS_TPIDRRO_EL0);
+DECLARE_DEFERRABLE_SR(TPIDR_EL1,	SYS_TPIDR_EL1);
+DECLARE_DEFERRABLE_SR(AMAIR_EL1,	amair_EL12);
+DECLARE_DEFERRABLE_SR(CNTKCTL_EL1,	cntkctl_EL12);
+DECLARE_DEFERRABLE_SR(PAR_EL1,		SYS_PAR_EL1);
 DECLARE_IMMEDIATE_SR(MDSCR_EL1);
 DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
 DECLARE_IMMEDIATE_SR(PMCR_EL0);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

32-bit registers are not used by a 64-bit host kernel and can be
deferred, but we need to rework the accesses to this register to access
the latest value depending on whether or not guest system registers are
loaded on the CPU or only reside in memory.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
 arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
 arch/arm64/kvm/sys_regs.c            |  6 ++--
 3 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c9ca2dc579c7..a27610185906 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,7 +33,8 @@
 #include <asm/virt.h>
 
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
 
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
@@ -150,41 +151,26 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-	if (vcpu_mode_is_32bit(vcpu)) {
-		unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-		if (p_32bit != (unsigned long *)p)
-			return *p_32bit;
-	}
+	if (vcpu_mode_is_32bit(vcpu))
+		return vcpu_read_spsr32(vcpu);
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(spsr);
 	else
-		return *p;
+		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
 }
 
-static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long v)
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 {
-	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-	/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
 	if (vcpu_mode_is_32bit(vcpu)) {
-		unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-		if (p_32bit != (unsigned long *)p) {
-			*p_32bit = v;
-			return;
-		}
+		vcpu_write_spsr32(vcpu, v);
+		return;
 	}
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, spsr);
 	else
-		*p = v;
+		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index bbc6ae32e4af..3f65098aff8d 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
+static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
 {
 	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
 	switch (mode) {
-	case COMPAT_PSR_MODE_SVC:
-		mode = KVM_SPSR_SVC;
-		break;
-	case COMPAT_PSR_MODE_ABT:
-		mode = KVM_SPSR_ABT;
-		break;
-	case COMPAT_PSR_MODE_UND:
-		mode = KVM_SPSR_UND;
-		break;
-	case COMPAT_PSR_MODE_IRQ:
-		mode = KVM_SPSR_IRQ;
-		break;
-	case COMPAT_PSR_MODE_FIQ:
-		mode = KVM_SPSR_FIQ;
-		break;
+	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
+	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
+	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
+	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
+	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
+	default: BUG();
+	}
+}
+
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
+{
+	int spsr_idx = vcpu_spsr32_mode(vcpu);
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
+
+	switch (spsr_idx) {
+	case KVM_SPSR_SVC:
+		return read_sysreg_el1(spsr);
+	case KVM_SPSR_ABT:
+		return read_sysreg(spsr_abt);
+	case KVM_SPSR_UND:
+		return read_sysreg(spsr_und);
+	case KVM_SPSR_IRQ:
+		return read_sysreg(spsr_irq);
+	case KVM_SPSR_FIQ:
+		return read_sysreg(spsr_fiq);
 	default:
 		BUG();
 	}
+}
 
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
+{
+	int spsr_idx = vcpu_spsr32_mode(vcpu);
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
+
+	switch (spsr_idx) {
+	case KVM_SPSR_SVC:
+		write_sysreg_el1(v, spsr);
+	case KVM_SPSR_ABT:
+		write_sysreg(v, spsr_abt);
+	case KVM_SPSR_UND:
+		write_sysreg(v, spsr_und);
+	case KVM_SPSR_IRQ:
+		write_sysreg(v, spsr_irq);
+	case KVM_SPSR_FIQ:
+		write_sysreg(v, spsr_fiq);
+	}
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 8df651a8a36c..096ac84c9bbd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -175,10 +175,10 @@ DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
 DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
 DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
 DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
-DECLARE_IMMEDIATE_SR(DACR32_EL2);
-DECLARE_IMMEDIATE_SR(IFSR32_EL2);
+DECLARE_DEFERRABLE_SR(DACR32_EL2,	SYS_DACR32_EL2);
+DECLARE_DEFERRABLE_SR(IFSR32_EL2,	SYS_IFSR32_EL2);
 DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
-DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
+DECLARE_DEFERRABLE_SR(DBGVCR32_EL2,	SYS_DBGVCR32_EL2);
 
 static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
 	[0 ... NR_SYS_REGS - 1] = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

32-bit registers are not used by a 64-bit host kernel and can be
deferred, but we need to rework the accesses to this register to access
the latest value depending on whether or not guest system registers are
loaded on the CPU or only reside in memory.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
 arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
 arch/arm64/kvm/sys_regs.c            |  6 ++--
 3 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c9ca2dc579c7..a27610185906 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,7 +33,8 @@
 #include <asm/virt.h>
 
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
 
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
@@ -150,41 +151,26 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
-	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-	if (vcpu_mode_is_32bit(vcpu)) {
-		unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-		if (p_32bit != (unsigned long *)p)
-			return *p_32bit;
-	}
+	if (vcpu_mode_is_32bit(vcpu))
+		return vcpu_read_spsr32(vcpu);
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(spsr);
 	else
-		return *p;
+		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
 }
 
-static inline void vcpu_write_spsr(const struct kvm_vcpu *vcpu, unsigned long v)
+static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 {
-	unsigned long *p = (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
-
-	/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
 	if (vcpu_mode_is_32bit(vcpu)) {
-		unsigned long *p_32bit = vcpu_spsr32(vcpu);
-
-		/* KVM_SPSR_SVC aliases KVM_SPSR_EL1 */
-		if (p_32bit != (unsigned long *)p) {
-			*p_32bit = v;
-			return;
-		}
+		vcpu_write_spsr32(vcpu, v);
+		return;
 	}
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, spsr);
 	else
-		*p = v;
+		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
 }
 
 static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index bbc6ae32e4af..3f65098aff8d 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
 /*
  * Return the SPSR for the current mode of the virtual CPU.
  */
-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
+static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
 {
 	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
 	switch (mode) {
-	case COMPAT_PSR_MODE_SVC:
-		mode = KVM_SPSR_SVC;
-		break;
-	case COMPAT_PSR_MODE_ABT:
-		mode = KVM_SPSR_ABT;
-		break;
-	case COMPAT_PSR_MODE_UND:
-		mode = KVM_SPSR_UND;
-		break;
-	case COMPAT_PSR_MODE_IRQ:
-		mode = KVM_SPSR_IRQ;
-		break;
-	case COMPAT_PSR_MODE_FIQ:
-		mode = KVM_SPSR_FIQ;
-		break;
+	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
+	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
+	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
+	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
+	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
+	default: BUG();
+	}
+}
+
+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
+{
+	int spsr_idx = vcpu_spsr32_mode(vcpu);
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
+
+	switch (spsr_idx) {
+	case KVM_SPSR_SVC:
+		return read_sysreg_el1(spsr);
+	case KVM_SPSR_ABT:
+		return read_sysreg(spsr_abt);
+	case KVM_SPSR_UND:
+		return read_sysreg(spsr_und);
+	case KVM_SPSR_IRQ:
+		return read_sysreg(spsr_irq);
+	case KVM_SPSR_FIQ:
+		return read_sysreg(spsr_fiq);
 	default:
 		BUG();
 	}
+}
 
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
+{
+	int spsr_idx = vcpu_spsr32_mode(vcpu);
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
+
+	switch (spsr_idx) {
+	case KVM_SPSR_SVC:
+		write_sysreg_el1(v, spsr);
+	case KVM_SPSR_ABT:
+		write_sysreg(v, spsr_abt);
+	case KVM_SPSR_UND:
+		write_sysreg(v, spsr_und);
+	case KVM_SPSR_IRQ:
+		write_sysreg(v, spsr_irq);
+	case KVM_SPSR_FIQ:
+		write_sysreg(v, spsr_fiq);
+	}
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 8df651a8a36c..096ac84c9bbd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -175,10 +175,10 @@ DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
 DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
 DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
 DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
-DECLARE_IMMEDIATE_SR(DACR32_EL2);
-DECLARE_IMMEDIATE_SR(IFSR32_EL2);
+DECLARE_DEFERRABLE_SR(DACR32_EL2,	SYS_DACR32_EL2);
+DECLARE_DEFERRABLE_SR(IFSR32_EL2,	SYS_IFSR32_EL2);
 DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
-DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
+DECLARE_DEFERRABLE_SR(DBGVCR32_EL2,	SYS_DBGVCR32_EL2);
 
 static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
 	[0 ... NR_SYS_REGS - 1] = {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 31/41] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

When running a 32-bit VM (EL1 in AArch32), the AArch32 system registers
can be deferred to vcpu load/put on VHE systems because neither
the host kernel nor host userspace uses these registers.

Note that we can no longer save/restore DBGVCR32_EL2 conditionally based
on the state of the debug dirty flag on VHE, but since we do the
load/put pretty rarely, this comes out as a win anyway.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c    |  6 ------
 arch/arm64/kvm/hyp/sysreg-sr.c | 12 ++++++++++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 2e04d404ac82..05f266b505ce 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -355,11 +355,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	__vgic_restore_state(vcpu);
 
-	/*
-	 * We must restore the 32-bit state before the sysregs, thanks
-	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-	 */
-	__sysreg32_restore_state(vcpu);
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
@@ -371,7 +366,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
-	__sysreg32_save_state(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index eabd35154232..d225f5797651 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -195,7 +195,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
 
@@ -217,7 +217,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
 	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
@@ -242,6 +242,13 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_user_state(host_ctxt);
 
+	/*
+	 * Load guest EL1 and user state
+	 *
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
 	__sysreg_restore_el1_state(guest_ctxt);
 
@@ -280,6 +287,7 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
+	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 31/41] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

When running a 32-bit VM (EL1 in AArch32), the AArch32 system registers
can be deferred to vcpu load/put on VHE systems because neither
the host kernel nor host userspace uses these registers.

Note that we can no longer save/restore DBGVCR32_EL2 conditionally based
on the state of the debug dirty flag on VHE, but since we do the
load/put pretty rarely, this comes out as a win anyway.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c    |  6 ------
 arch/arm64/kvm/hyp/sysreg-sr.c | 12 ++++++++++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 2e04d404ac82..05f266b505ce 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -355,11 +355,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	__vgic_restore_state(vcpu);
 
-	/*
-	 * We must restore the 32-bit state before the sysregs, thanks
-	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
-	 */
-	__sysreg32_restore_state(vcpu);
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
@@ -371,7 +366,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
-	__sysreg32_save_state(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index eabd35154232..d225f5797651 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -195,7 +195,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
 
@@ -217,7 +217,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
 	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
 
-	if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
@@ -242,6 +242,13 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_user_state(host_ctxt);
 
+	/*
+	 * Load guest EL1 and user state
+	 *
+	 * We must restore the 32-bit state before the sysregs, thanks
+	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
+	 */
+	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
 	__sysreg_restore_el1_state(guest_ctxt);
 
@@ -280,6 +287,7 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
+	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 32/41] KVM: arm64: Move common VHE/non-VHE trap config in separate functions
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

As we are about to be more lazy with some of the trap configuration
register read/writes for VHE systems, move the logic that is currently
shared between VHE and non-VHE into a separate function which can be
called from either the world-switch path or from vcpu_load/vcpu_put.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 68 ++++++++++++++++++++++++++-------------------
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 05f266b505ce..c01bcfc3fb52 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,6 +24,43 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * We are about to set CPTR_EL2.TFP to trap all floating point
+	 * register accesses to EL2, however, the ARM ARM clearly states that
+	 * traps are only taken to EL2 if the operation would not otherwise
+	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
+	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
+	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
+	 * it will cause an exception.
+	 */
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
+	    !vcpu->arch.guest_vfp_loaded) {
+		write_sysreg(1 << 30, fpexc32_el2);
+		isb();
+	}
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
+
+	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
+	write_sysreg(1 << 15, hstr_el2);
+	/*
+	 * Make sure we trap PMU access from EL0 to EL2. Also sanitize
+	 * PMSELR_EL0 to make sure it never contains the cycle
+	 * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
+	 * EL1 instead of being trapped to EL2.
+	 */
+	write_sysreg(0, pmselr_el0);
+	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
+static void __hyp_text __deactivate_traps_common(void)
+{
+	write_sysreg(0, hstr_el2);
+	write_sysreg(0, pmuserenr_el0);
+}
+
 static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -59,33 +96,7 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
-	 * it will cause an exception.
-	 */
-	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
-	    !vcpu->arch.guest_vfp_loaded) {
-		write_sysreg(1 << 30, fpexc32_el2);
-		isb();
-	}
-	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
-
-	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
-	write_sysreg(1 << 15, hstr_el2);
-	/*
-	 * Make sure we trap PMU access from EL0 to EL2. Also sanitize
-	 * PMSELR_EL0 to make sure it never contains the cycle
-	 * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
-	 * EL1 instead of being trapped to EL2.
-	 */
-	write_sysreg(0, pmselr_el0);
-	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	__activate_traps_common(vcpu);
 	__activate_traps_arch()(vcpu);
 }
 
@@ -131,9 +142,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
+	__deactivate_traps_common();
 	__deactivate_traps_arch()();
-	write_sysreg(0, hstr_el2);
-	write_sysreg(0, pmuserenr_el0);
 }
 
 static void __hyp_text __activate_vm(struct kvm *kvm)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 32/41] KVM: arm64: Move common VHE/non-VHE trap config in separate functions
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to be more lazy with some of the trap configuration
register read/writes for VHE systems, move the logic that is currently
shared between VHE and non-VHE into a separate function which can be
called from either the world-switch path or from vcpu_load/vcpu_put.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 68 ++++++++++++++++++++++++++-------------------
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 05f266b505ce..c01bcfc3fb52 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,6 +24,43 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * We are about to set CPTR_EL2.TFP to trap all floating point
+	 * register accesses to EL2, however, the ARM ARM clearly states that
+	 * traps are only taken to EL2 if the operation would not otherwise
+	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
+	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
+	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
+	 * it will cause an exception.
+	 */
+	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
+	    !vcpu->arch.guest_vfp_loaded) {
+		write_sysreg(1 << 30, fpexc32_el2);
+		isb();
+	}
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
+
+	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
+	write_sysreg(1 << 15, hstr_el2);
+	/*
+	 * Make sure we trap PMU access from EL0 to EL2. Also sanitize
+	 * PMSELR_EL0 to make sure it never contains the cycle
+	 * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
+	 * EL1 instead of being trapped to EL2.
+	 */
+	write_sysreg(0, pmselr_el0);
+	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
+static void __hyp_text __deactivate_traps_common(void)
+{
+	write_sysreg(0, hstr_el2);
+	write_sysreg(0, pmuserenr_el0);
+}
+
 static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -59,33 +96,7 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
-	 * it will cause an exception.
-	 */
-	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
-	    !vcpu->arch.guest_vfp_loaded) {
-		write_sysreg(1 << 30, fpexc32_el2);
-		isb();
-	}
-	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
-
-	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
-	write_sysreg(1 << 15, hstr_el2);
-	/*
-	 * Make sure we trap PMU access from EL0 to EL2. Also sanitize
-	 * PMSELR_EL0 to make sure it never contains the cycle
-	 * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
-	 * EL1 instead of being trapped to EL2.
-	 */
-	write_sysreg(0, pmselr_el0);
-	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	__activate_traps_common(vcpu);
 	__activate_traps_arch()(vcpu);
 }
 
@@ -131,9 +142,8 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
+	__deactivate_traps_common();
 	__deactivate_traps_arch()();
-	write_sysreg(0, hstr_el2);
-	write_sysreg(0, pmuserenr_el0);
 }
 
 static void __hyp_text __activate_vm(struct kvm *kvm)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There is no need to enable/disable traps to FP registers on every switch
to/from the VM, because the host kernel does not use this resource
without calling vcpu_put.  We can therefore move things around enough
that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
only program these during vcpu load/put.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  6 +++++
 arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
 arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
 3 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 3f54c55f77a1..ffd62e31f134 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
 bool __fpsimd_enabled(void);
 
+void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
+void __deactivate_traps_nvhe_put(void);
+
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
+void deactivate_traps_vhe_put(void);
+
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c01bcfc3fb52..d14ab9650f81 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,22 +24,25 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
-static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
-	 * it will cause an exception.
+	 * We are about to trap all floating point register accesses to EL2,
+	 * however, traps are only taken to EL2 if the operation would not
+	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
+	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
+	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
+	 * any access to it will cause an exception.
 	 */
 	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
 	    !vcpu->arch.guest_vfp_loaded) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
+}
+
+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+{
 	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 
 	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
@@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
 	write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	__activate_traps_fpsimd32(vcpu);
+
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
@@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 	else
 		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
+}
 
+void deactivate_traps_vhe_put(void)
+{
+	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+}
+
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+{
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	vcpu = kern_hyp_va(vcpu);
+
+	__activate_traps_fpsimd32(vcpu);
+
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
 	if (vcpu->arch.guest_vfp_loaded)
@@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 	write_sysreg(val, cptr_el2);
 }
 
+void __hyp_text __deactivate_traps_nvhe_put(void)
+{
+	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
+}
+
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+{
+}
+
 static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
@@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
-	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
 	write_sysreg(vectors, vbar_el1);
 }
 
-static void __hyp_text __deactivate_traps_nvhe(void)
-{
+static void __hyp_text __deactivate_traps_nvhe(void) {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
 	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
@@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_RW, hcr_el2);
-	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 }
 
 static hyp_alternate_select(__deactivate_traps_arch,
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index d225f5797651..7943d5b4dbcb 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 
-	if (!has_vhe())
+	if (!has_vhe()) {
+		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
 		return;
+	}
 
 	__sysreg_save_user_state(host_ctxt);
 
@@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 	__sysreg_restore_el1_state(guest_ctxt);
 
 	vcpu->arch.sysregs_loaded_on_cpu = true;
+
+	activate_traps_vhe_load(vcpu);
 }
 
 /**
@@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 		vcpu->arch.guest_vfp_loaded = 0;
 	}
 
-	if (!has_vhe())
+	if (!has_vhe()) {
+		kvm_call_hyp(__deactivate_traps_nvhe_put);
 		return;
+	}
+
+	deactivate_traps_vhe_put();
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to enable/disable traps to FP registers on every switch
to/from the VM, because the host kernel does not use this resource
without calling vcpu_put.  We can therefore move things around enough
that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
only program these during vcpu load/put.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  6 +++++
 arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
 arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
 3 files changed, 53 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 3f54c55f77a1..ffd62e31f134 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
 bool __fpsimd_enabled(void);
 
+void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
+void __deactivate_traps_nvhe_put(void);
+
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
+void deactivate_traps_vhe_put(void);
+
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c01bcfc3fb52..d14ab9650f81 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -24,22 +24,25 @@
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
 
-static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * We are about to set CPTR_EL2.TFP to trap all floating point
-	 * register accesses to EL2, however, the ARM ARM clearly states that
-	 * traps are only taken to EL2 if the operation would not otherwise
-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
-	 * it will cause an exception.
+	 * We are about to trap all floating point register accesses to EL2,
+	 * however, traps are only taken to EL2 if the operation would not
+	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
+	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
+	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
+	 * any access to it will cause an exception.
 	 */
 	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
 	    !vcpu->arch.guest_vfp_loaded) {
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
+}
+
+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
+{
 	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 
 	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
@@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
 	write_sysreg(0, pmuserenr_el0);
 }
 
-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	__activate_traps_fpsimd32(vcpu);
+
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
@@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
 	else
 		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
+}
 
+void deactivate_traps_vhe_put(void)
+{
+	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+}
+
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+{
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	vcpu = kern_hyp_va(vcpu);
+
+	__activate_traps_fpsimd32(vcpu);
+
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
 	if (vcpu->arch.guest_vfp_loaded)
@@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 	write_sysreg(val, cptr_el2);
 }
 
+void __hyp_text __deactivate_traps_nvhe_put(void)
+{
+	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
+}
+
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+{
+}
+
 static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
@@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
-	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
 	write_sysreg(vectors, vbar_el1);
 }
 
-static void __hyp_text __deactivate_traps_nvhe(void)
-{
+static void __hyp_text __deactivate_traps_nvhe(void) {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
 	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
@@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_RW, hcr_el2);
-	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
 }
 
 static hyp_alternate_select(__deactivate_traps_arch,
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index d225f5797651..7943d5b4dbcb 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 
-	if (!has_vhe())
+	if (!has_vhe()) {
+		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
 		return;
+	}
 
 	__sysreg_save_user_state(host_ctxt);
 
@@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
 	__sysreg_restore_el1_state(guest_ctxt);
 
 	vcpu->arch.sysregs_loaded_on_cpu = true;
+
+	activate_traps_vhe_load(vcpu);
 }
 
 /**
@@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 		vcpu->arch.guest_vfp_loaded = 0;
 	}
 
-	if (!has_vhe())
+	if (!has_vhe()) {
+		kvm_call_hyp(__deactivate_traps_nvhe_put);
 		return;
+	}
+
+	deactivate_traps_vhe_put();
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 34/41] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

We do not have to change the c15 trap setting on each switch to/from the
guest on VHE systems, because this setting only affects EL0.

The PMU and debug trap configuration can also be done on vcpu load/put
instead, because they don't affect how the host kernel can access the
debug registers while executing KVM kernel code and KVM doesn't use
floating point itself.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d14ab9650f81..6ff9fab4233e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -43,8 +43,6 @@ static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
-
 	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
@@ -64,12 +62,15 @@ static void __hyp_text __deactivate_traps_common(void)
 	write_sysreg(0, pmuserenr_el0);
 }
 
+/* Activate the traps we can during vcpu_load with VHE */
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	/* Make sure 32-bit guests trap VFP */
 	__activate_traps_fpsimd32(vcpu);
 
+	/* Trap VFP accesses on a VHE system */
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
@@ -78,11 +79,28 @@ void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 	else
 		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
+
+	/* Activate traps on impdef sysregs, PMU, and debug */
+	__activate_traps_common(vcpu);
 }
 
+/* Deactivate the traps we can during vcpu_put with VHE */
 void deactivate_traps_vhe_put(void)
 {
+	u64 mdcr_el2;
+
+	/* Re-enable host VFP and SVE access */
 	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+
+	/* Re-enable host access to impdef sysregs and the PMU */
+	__deactivate_traps_common();
+
+	/* Re-enable host access to the debug regs */
+	mdcr_el2 = read_sysreg(mdcr_el2);
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
+		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
+		    MDCR_EL2_TPMS;
+	write_sysreg(mdcr_el2, mdcr_el2);
 }
 
 static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
@@ -96,8 +114,10 @@ void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
 
 	vcpu = kern_hyp_va(vcpu);
 
+	/* Make sure 32-bit guests trap VFP */
 	__activate_traps_fpsimd32(vcpu);
 
+	/* Trap VFP accesses on a non-VHE system */
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
 	if (vcpu->arch.guest_vfp_loaded)
@@ -114,6 +134,8 @@ void __hyp_text __deactivate_traps_nvhe_put(void)
 
 static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
+	/* Activate traps on impdef sysregs, PMU, and debug */
+	__activate_traps_common(vcpu);
 }
 
 static hyp_alternate_select(__activate_traps_arch,
@@ -122,20 +144,14 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	__activate_traps_common(vcpu);
 	__activate_traps_arch()(vcpu);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
 {
 	extern char vectors[];	/* kernel exception vectors */
-	u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-		    MDCR_EL2_TPMS;
 
-	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
 	write_sysreg(vectors, vbar_el1);
 }
@@ -143,6 +159,8 @@ static void __hyp_text __deactivate_traps_vhe(void)
 static void __hyp_text __deactivate_traps_nvhe(void) {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
+	__deactivate_traps_common();
+
 	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
 	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
@@ -165,7 +183,6 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-	__deactivate_traps_common();
 	__deactivate_traps_arch()();
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 34/41] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We do not have to change the c15 trap setting on each switch to/from the
guest on VHE systems, because this setting only affects EL0.

The PMU and debug trap configuration can also be done on vcpu load/put
instead, because they don't affect how the host kernel can access the
debug registers while executing KVM kernel code and KVM doesn't use
floating point itself.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d14ab9650f81..6ff9fab4233e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -43,8 +43,6 @@ static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
-
 	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
@@ -64,12 +62,15 @@ static void __hyp_text __deactivate_traps_common(void)
 	write_sysreg(0, pmuserenr_el0);
 }
 
+/* Activate the traps we can during vcpu_load with VHE */
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
+	/* Make sure 32-bit guests trap VFP */
 	__activate_traps_fpsimd32(vcpu);
 
+	/* Trap VFP accesses on a VHE system */
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
@@ -78,11 +79,28 @@ void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 	else
 		val &= ~CPACR_EL1_FPEN;
 	write_sysreg(val, cpacr_el1);
+
+	/* Activate traps on impdef sysregs, PMU, and debug */
+	__activate_traps_common(vcpu);
 }
 
+/* Deactivate the traps we can during vcpu_put with VHE */
 void deactivate_traps_vhe_put(void)
 {
+	u64 mdcr_el2;
+
+	/* Re-enable host VFP and SVE access */
 	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+
+	/* Re-enable host access to impdef sysregs and the PMU */
+	__deactivate_traps_common();
+
+	/* Re-enable host access to the debug regs */
+	mdcr_el2 = read_sysreg(mdcr_el2);
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
+		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
+		    MDCR_EL2_TPMS;
+	write_sysreg(mdcr_el2, mdcr_el2);
 }
 
 static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
@@ -96,8 +114,10 @@ void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
 
 	vcpu = kern_hyp_va(vcpu);
 
+	/* Make sure 32-bit guests trap VFP */
 	__activate_traps_fpsimd32(vcpu);
 
+	/* Trap VFP accesses on a non-VHE system */
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
 	if (vcpu->arch.guest_vfp_loaded)
@@ -114,6 +134,8 @@ void __hyp_text __deactivate_traps_nvhe_put(void)
 
 static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
+	/* Activate traps on impdef sysregs, PMU, and debug */
+	__activate_traps_common(vcpu);
 }
 
 static hyp_alternate_select(__activate_traps_arch,
@@ -122,20 +144,14 @@ static hyp_alternate_select(__activate_traps_arch,
 
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
-	__activate_traps_common(vcpu);
 	__activate_traps_arch()(vcpu);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
 {
 	extern char vectors[];	/* kernel exception vectors */
-	u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-		    MDCR_EL2_TPMS;
 
-	write_sysreg(mdcr_el2, mdcr_el2);
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
 	write_sysreg(vectors, vbar_el1);
 }
@@ -143,6 +159,8 @@ static void __hyp_text __deactivate_traps_vhe(void)
 static void __hyp_text __deactivate_traps_nvhe(void) {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
+	__deactivate_traps_common();
+
 	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
 	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
 
@@ -165,7 +183,6 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-	__deactivate_traps_common();
 	__deactivate_traps_arch()();
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 35/41] KVM: arm64: Separate activate_traps and deactive_traps for VHE and non-VHE
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

To make the code more readable and to avoid the overhead of a function
call, let's get rid of a pair of the alternative function selectors and
explicitly call the VHE and non-VHE functions instead, telling the
compiler to try to inline the static function if it can.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 78 +++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6ff9fab4233e..53a137821ee9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -103,9 +103,27 @@ void deactivate_traps_vhe_put(void)
 	write_sysreg(mdcr_el2, mdcr_el2);
 }
 
-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+static inline void activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(__kvm_hyp_vector, vbar_el1);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
+	write_sysreg_el2(__kvm_hyp_vector, vbar);
+}
+
+static inline void deactivate_traps_vhe(struct kvm_vcpu *vcpu)
+{
+	extern char vectors[];	/* kernel exception vectors */
+
+	/*
+	 * If we pended a virtual abort, preserve it until it gets
+	 * cleared. See D1.14.3 (Virtual Interrupts) for details, but
+	 * the crucial bit is "On taking a vSError interrupt,
+	 * HCR_EL2.VSE is cleared to 0."
+	 */
+	if (vcpu->arch.hcr_el2 & HCR_VSE)
+		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
+
+	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
+	write_sysreg(vectors, vbar_el1);
 }
 
 void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
@@ -136,44 +154,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	/* Activate traps on impdef sysregs, PMU, and debug */
 	__activate_traps_common(vcpu);
-}
 
-static hyp_alternate_select(__activate_traps_arch,
-			    __activate_traps_nvhe, __activate_traps_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
-{
-	__activate_traps_arch()(vcpu);
+	/* Configure all other hypervisor traps and features */
 	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 }
 
-static void __hyp_text __deactivate_traps_vhe(void)
+static void __hyp_text __deactivate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
-	extern char vectors[];	/* kernel exception vectors */
-
-	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
-	write_sysreg(vectors, vbar_el1);
-}
-
-static void __hyp_text __deactivate_traps_nvhe(void) {
-	u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-	__deactivate_traps_common();
-
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
-	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
-
-	write_sysreg(mdcr_el2, mdcr_el2);
-	write_sysreg(HCR_RW, hcr_el2);
-}
-
-static hyp_alternate_select(__deactivate_traps_arch,
-			    __deactivate_traps_nvhe, __deactivate_traps_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
+	u64 mdcr_el2;
 
-static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
-{
 	/*
 	 * If we pended a virtual abort, preserve it until it gets
 	 * cleared. See D1.14.3 (Virtual Interrupts) for details, but
@@ -183,7 +172,14 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-	__deactivate_traps_arch()();
+	__deactivate_traps_common();
+
+	mdcr_el2 = read_sysreg(mdcr_el2);
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
+	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
+
+	write_sysreg(mdcr_el2, mdcr_el2);
+	write_sysreg(HCR_RW, hcr_el2);
 }
 
 static void __hyp_text __activate_vm(struct kvm *kvm)
@@ -400,7 +396,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_save_host_state_vhe(host_ctxt);
 
-	__activate_traps(vcpu);
+	activate_traps_vhe(vcpu);
 	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
@@ -418,7 +414,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_save_guest_state_vhe(guest_ctxt);
 	__vgic_save_state(vcpu);
 
-	__deactivate_traps(vcpu);
+	deactivate_traps_vhe(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
@@ -442,7 +438,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 
-	__activate_traps(vcpu);
+	__activate_traps_nvhe(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
 
 	__vgic_restore_state(vcpu);
@@ -468,7 +464,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
-	__deactivate_traps(vcpu);
+	__deactivate_traps_nvhe(vcpu);
 	__deactivate_vm(vcpu);
 
 	__sysreg_restore_state_nvhe(host_ctxt);
@@ -494,7 +490,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 
 	if (read_sysreg(vttbr_el2)) {
 		__timer_disable_traps(vcpu);
-		__deactivate_traps(vcpu);
+		__deactivate_traps_nvhe(vcpu);
 		__deactivate_vm(vcpu);
 		__sysreg_restore_state_nvhe(__host_ctxt);
 	}
@@ -518,7 +514,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
 	struct kvm_vcpu *vcpu;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-	__deactivate_traps(vcpu);
+	deactivate_traps_vhe(vcpu);
 	sysreg_restore_host_state_vhe(host_ctxt);
 
 	panic(__hyp_panic_string,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 35/41] KVM: arm64: Separate activate_traps and deactive_traps for VHE and non-VHE
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

To make the code more readable and to avoid the overhead of a function
call, let's get rid of a pair of the alternative function selectors and
explicitly call the VHE and non-VHE functions instead, telling the
compiler to try to inline the static function if it can.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 78 +++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 6ff9fab4233e..53a137821ee9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -103,9 +103,27 @@ void deactivate_traps_vhe_put(void)
 	write_sysreg(mdcr_el2, mdcr_el2);
 }
 
-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+static inline void activate_traps_vhe(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(__kvm_hyp_vector, vbar_el1);
+	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
+	write_sysreg_el2(__kvm_hyp_vector, vbar);
+}
+
+static inline void deactivate_traps_vhe(struct kvm_vcpu *vcpu)
+{
+	extern char vectors[];	/* kernel exception vectors */
+
+	/*
+	 * If we pended a virtual abort, preserve it until it gets
+	 * cleared. See D1.14.3 (Virtual Interrupts) for details, but
+	 * the crucial bit is "On taking a vSError interrupt,
+	 * HCR_EL2.VSE is cleared to 0."
+	 */
+	if (vcpu->arch.hcr_el2 & HCR_VSE)
+		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
+
+	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
+	write_sysreg(vectors, vbar_el1);
 }
 
 void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
@@ -136,44 +154,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
 	/* Activate traps on impdef sysregs, PMU, and debug */
 	__activate_traps_common(vcpu);
-}
 
-static hyp_alternate_select(__activate_traps_arch,
-			    __activate_traps_nvhe, __activate_traps_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
-
-static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
-{
-	__activate_traps_arch()(vcpu);
+	/* Configure all other hypervisor traps and features */
 	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
 }
 
-static void __hyp_text __deactivate_traps_vhe(void)
+static void __hyp_text __deactivate_traps_nvhe(struct kvm_vcpu *vcpu)
 {
-	extern char vectors[];	/* kernel exception vectors */
-
-	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
-	write_sysreg(vectors, vbar_el1);
-}
-
-static void __hyp_text __deactivate_traps_nvhe(void) {
-	u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-	__deactivate_traps_common();
-
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
-	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
-
-	write_sysreg(mdcr_el2, mdcr_el2);
-	write_sysreg(HCR_RW, hcr_el2);
-}
-
-static hyp_alternate_select(__deactivate_traps_arch,
-			    __deactivate_traps_nvhe, __deactivate_traps_vhe,
-			    ARM64_HAS_VIRT_HOST_EXTN);
+	u64 mdcr_el2;
 
-static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
-{
 	/*
 	 * If we pended a virtual abort, preserve it until it gets
 	 * cleared. See D1.14.3 (Virtual Interrupts) for details, but
@@ -183,7 +172,14 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.hcr_el2 & HCR_VSE)
 		vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
 
-	__deactivate_traps_arch()();
+	__deactivate_traps_common();
+
+	mdcr_el2 = read_sysreg(mdcr_el2);
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
+	mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
+
+	write_sysreg(mdcr_el2, mdcr_el2);
+	write_sysreg(HCR_RW, hcr_el2);
 }
 
 static void __hyp_text __activate_vm(struct kvm *kvm)
@@ -400,7 +396,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_save_host_state_vhe(host_ctxt);
 
-	__activate_traps(vcpu);
+	activate_traps_vhe(vcpu);
 	__activate_vm(vcpu->kvm);
 
 	__vgic_restore_state(vcpu);
@@ -418,7 +414,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_save_guest_state_vhe(guest_ctxt);
 	__vgic_save_state(vcpu);
 
-	__deactivate_traps(vcpu);
+	deactivate_traps_vhe(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
@@ -442,7 +438,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 
-	__activate_traps(vcpu);
+	__activate_traps_nvhe(vcpu);
 	__activate_vm(kern_hyp_va(vcpu->kvm));
 
 	__vgic_restore_state(vcpu);
@@ -468,7 +464,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
-	__deactivate_traps(vcpu);
+	__deactivate_traps_nvhe(vcpu);
 	__deactivate_vm(vcpu);
 
 	__sysreg_restore_state_nvhe(host_ctxt);
@@ -494,7 +490,7 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 
 	if (read_sysreg(vttbr_el2)) {
 		__timer_disable_traps(vcpu);
-		__deactivate_traps(vcpu);
+		__deactivate_traps_nvhe(vcpu);
 		__deactivate_vm(vcpu);
 		__sysreg_restore_state_nvhe(__host_ctxt);
 	}
@@ -518,7 +514,7 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
 	struct kvm_vcpu *vcpu;
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-	__deactivate_traps(vcpu);
+	deactivate_traps_vhe(vcpu);
 	sysreg_restore_host_state_vhe(host_ctxt);
 
 	panic(__hyp_panic_string,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 36/41] KVM: arm/arm64: Get rid of vgic_elrsr
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

There is really no need to store the vgic_elrsr on the VGIC data
structures as the only need we have for the elrsr is to figure out if an
LR is inactive when we save the VGIC state upon returning from the
guest.  We can might as well store this in a temporary local variable.

This also gets rid of the endianness conversion in the VGIC save
function, which is completely unnecessary and would actually result in
incorrect functionality on big-endian systems, because we are only using
typed values here and not converting pointers and reading different
types here.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 include/kvm/arm_vgic.h        |  2 --
 virt/kvm/arm/hyp/vgic-v2-sr.c | 28 +++++++---------------------
 virt/kvm/arm/hyp/vgic-v3-sr.c |  6 +++---
 virt/kvm/arm/vgic/vgic-v2.c   |  1 -
 virt/kvm/arm/vgic/vgic-v3.c   |  1 -
 5 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index cdbd142ca7f2..ac98ae46bfb7 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -263,7 +263,6 @@ struct vgic_dist {
 struct vgic_v2_cpu_if {
 	u32		vgic_hcr;
 	u32		vgic_vmcr;
-	u64		vgic_elrsr;	/* Saved only */
 	u32		vgic_apr;
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
 };
@@ -272,7 +271,6 @@ struct vgic_v3_cpu_if {
 	u32		vgic_hcr;
 	u32		vgic_vmcr;
 	u32		vgic_sre;	/* Restored only, change ignored */
-	u32		vgic_elrsr;	/* Saved only */
 	u32		vgic_ap0r[4];
 	u32		vgic_ap1r[4];
 	u64		vgic_lr[VGIC_V3_MAX_LRS];
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index d7fd46fe9efb..c536e3d87942 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -22,29 +22,19 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
-	u32 elrsr0, elrsr1;
-
-	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(nr_lr > 32))
-		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
-	else
-		elrsr1 = 0;
-
-	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
-}
-
 static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	int i;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 elrsr;
+	int i;
+
+	elrsr = readl_relaxed(base + GICH_ELRSR0);
+	if (unlikely(used_lrs > 32))
+		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
 
 	for (i = 0; i < used_lrs; i++) {
-		if (cpu_if->vgic_elrsr & (1UL << i))
+		if (elrsr & (1UL << i))
 			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
 		else
 			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
@@ -67,13 +57,9 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
-
-		save_elrsr(vcpu, base);
 		save_lrs(vcpu, base);
-
 		writel_relaxed(0, base + GICH_HCR);
 	} else {
-		cpu_if->vgic_elrsr = ~0UL;
 		cpu_if->vgic_apr = 0;
 	}
 }
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index f5c3d6d7019e..9abf2f3c12b5 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -222,15 +222,16 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	if (used_lrs) {
 		int i;
 		u32 nr_pre_bits;
+		u32 elrsr;
 
-		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
+		elrsr = read_gicreg(ICH_ELSR_EL2);
 
 		write_gicreg(0, ICH_HCR_EL2);
 		val = read_gicreg(ICH_VTR_EL2);
 		nr_pre_bits = vtr_to_nr_pre_bits(val);
 
 		for (i = 0; i < used_lrs; i++) {
-			if (cpu_if->vgic_elrsr & (1 << i))
+			if (elrsr & (1 << i))
 				cpu_if->vgic_lr[i] &= ~ICH_LR_STATE;
 			else
 				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
@@ -262,7 +263,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 		    cpu_if->its_vpe.its_vm)
 			write_gicreg(0, ICH_HCR_EL2);
 
-		cpu_if->vgic_elrsr = 0xffff;
 		cpu_if->vgic_ap0r[0] = 0;
 		cpu_if->vgic_ap0r[1] = 0;
 		cpu_if->vgic_ap0r[2] = 0;
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index c32d7b93ffd1..bb305d49cfdd 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -265,7 +265,6 @@ void vgic_v2_enable(struct kvm_vcpu *vcpu)
 	 * anyway.
 	 */
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
-	vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
 	/* Get the show on the road... */
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 6b329414e57a..b76e21f3e6bd 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -267,7 +267,6 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 	 * anyway.
 	 */
 	vgic_v3->vgic_vmcr = 0;
-	vgic_v3->vgic_elrsr = ~0;
 
 	/*
 	 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 36/41] KVM: arm/arm64: Get rid of vgic_elrsr
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

There is really no need to store the vgic_elrsr on the VGIC data
structures as the only need we have for the elrsr is to figure out if an
LR is inactive when we save the VGIC state upon returning from the
guest.  We can might as well store this in a temporary local variable.

This also gets rid of the endianness conversion in the VGIC save
function, which is completely unnecessary and would actually result in
incorrect functionality on big-endian systems, because we are only using
typed values here and not converting pointers and reading different
types here.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 include/kvm/arm_vgic.h        |  2 --
 virt/kvm/arm/hyp/vgic-v2-sr.c | 28 +++++++---------------------
 virt/kvm/arm/hyp/vgic-v3-sr.c |  6 +++---
 virt/kvm/arm/vgic/vgic-v2.c   |  1 -
 virt/kvm/arm/vgic/vgic-v3.c   |  1 -
 5 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index cdbd142ca7f2..ac98ae46bfb7 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -263,7 +263,6 @@ struct vgic_dist {
 struct vgic_v2_cpu_if {
 	u32		vgic_hcr;
 	u32		vgic_vmcr;
-	u64		vgic_elrsr;	/* Saved only */
 	u32		vgic_apr;
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
 };
@@ -272,7 +271,6 @@ struct vgic_v3_cpu_if {
 	u32		vgic_hcr;
 	u32		vgic_vmcr;
 	u32		vgic_sre;	/* Restored only, change ignored */
-	u32		vgic_elrsr;	/* Saved only */
 	u32		vgic_ap0r[4];
 	u32		vgic_ap1r[4];
 	u64		vgic_lr[VGIC_V3_MAX_LRS];
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index d7fd46fe9efb..c536e3d87942 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -22,29 +22,19 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
-	u32 elrsr0, elrsr1;
-
-	elrsr0 = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(nr_lr > 32))
-		elrsr1 = readl_relaxed(base + GICH_ELRSR1);
-	else
-		elrsr1 = 0;
-
-	cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
-}
-
 static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	int i;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 elrsr;
+	int i;
+
+	elrsr = readl_relaxed(base + GICH_ELRSR0);
+	if (unlikely(used_lrs > 32))
+		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
 
 	for (i = 0; i < used_lrs; i++) {
-		if (cpu_if->vgic_elrsr & (1UL << i))
+		if (elrsr & (1UL << i))
 			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
 		else
 			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
@@ -67,13 +57,9 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
-
-		save_elrsr(vcpu, base);
 		save_lrs(vcpu, base);
-
 		writel_relaxed(0, base + GICH_HCR);
 	} else {
-		cpu_if->vgic_elrsr = ~0UL;
 		cpu_if->vgic_apr = 0;
 	}
 }
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index f5c3d6d7019e..9abf2f3c12b5 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -222,15 +222,16 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	if (used_lrs) {
 		int i;
 		u32 nr_pre_bits;
+		u32 elrsr;
 
-		cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
+		elrsr = read_gicreg(ICH_ELSR_EL2);
 
 		write_gicreg(0, ICH_HCR_EL2);
 		val = read_gicreg(ICH_VTR_EL2);
 		nr_pre_bits = vtr_to_nr_pre_bits(val);
 
 		for (i = 0; i < used_lrs; i++) {
-			if (cpu_if->vgic_elrsr & (1 << i))
+			if (elrsr & (1 << i))
 				cpu_if->vgic_lr[i] &= ~ICH_LR_STATE;
 			else
 				cpu_if->vgic_lr[i] = __gic_v3_get_lr(i);
@@ -262,7 +263,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 		    cpu_if->its_vpe.its_vm)
 			write_gicreg(0, ICH_HCR_EL2);
 
-		cpu_if->vgic_elrsr = 0xffff;
 		cpu_if->vgic_ap0r[0] = 0;
 		cpu_if->vgic_ap0r[1] = 0;
 		cpu_if->vgic_ap0r[2] = 0;
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index c32d7b93ffd1..bb305d49cfdd 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -265,7 +265,6 @@ void vgic_v2_enable(struct kvm_vcpu *vcpu)
 	 * anyway.
 	 */
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
-	vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
 	/* Get the show on the road... */
 	vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 6b329414e57a..b76e21f3e6bd 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -267,7 +267,6 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 	 * anyway.
 	 */
 	vgic_v3->vgic_vmcr = 0;
-	vgic_v3->vgic_elrsr = ~0;
 
 	/*
 	 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 37/41] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

We can program the GICv2 hypervisor control interface logic directly
from the core vgic code and can instead do the save/restore directly
from the flush/sync functions, which can lead to a number of future
optimizations.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/hyp/switch.c        |  4 ---
 arch/arm64/include/asm/kvm_hyp.h |  2 --
 arch/arm64/kvm/hyp/switch.c      |  4 ---
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 65 ----------------------------------------
 virt/kvm/arm/vgic/vgic-v2.c      | 63 ++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.c         | 19 +++++++++++-
 virt/kvm/arm/vgic/vgic.h         |  3 ++
 7 files changed, 84 insertions(+), 76 deletions(-)

diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 7b2bd25e3b10..214187446e63 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -91,16 +91,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_save_state(vcpu);
-	else
-		__vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
-	else
-		__vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index ffd62e31f134..16a5342c4821 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -121,8 +121,6 @@ typeof(orig) * __hyp_text fname(void)					\
 	return val;							\
 }
 
-void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 53a137821ee9..74b7d7598a51 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -196,16 +196,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_save_state(vcpu);
-	else
-		__vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
-	else
-		__vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index c536e3d87942..b433257f4348 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -22,71 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 elrsr;
-	int i;
-
-	elrsr = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(used_lrs > 32))
-		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
-
-	for (i = 0; i < used_lrs; i++) {
-		if (elrsr & (1UL << i))
-			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
-		else
-			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
-
-		writel_relaxed(0, base + GICH_LR0 + (i * 4));
-	}
-}
-
-/* vcpu is already in the HYP VA space */
-void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	struct vgic_dist *vgic = &kvm->arch.vgic;
-	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-
-	if (!base)
-		return;
-
-	if (used_lrs) {
-		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
-		save_lrs(vcpu, base);
-		writel_relaxed(0, base + GICH_HCR);
-	} else {
-		cpu_if->vgic_apr = 0;
-	}
-}
-
-/* vcpu is already in the HYP VA space */
-void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	struct vgic_dist *vgic = &kvm->arch.vgic;
-	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
-	int i;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-
-	if (!base)
-		return;
-
-	if (used_lrs) {
-		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
-		for (i = 0; i < used_lrs; i++) {
-			writel_relaxed(cpu_if->vgic_lr[i],
-				       base + GICH_LR0 + (i * 4));
-		}
-	}
-}
-
 #ifdef CONFIG_ARM64
 /*
  * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index bb305d49cfdd..1e5f3eb6973d 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -421,6 +421,69 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 	return ret;
 }
 
+static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
+{
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 elrsr;
+	int i;
+
+	elrsr = readl_relaxed(base + GICH_ELRSR0);
+	if (unlikely(used_lrs > 32))
+		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
+
+	for (i = 0; i < used_lrs; i++) {
+		if (elrsr & (1UL << i))
+			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
+		else
+			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
+
+		writel_relaxed(0, base + GICH_LR0 + (i * 4));
+	}
+}
+
+void vgic_v2_save_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *vgic = &kvm->arch.vgic;
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	void __iomem *base = vgic->vctrl_base;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+
+	if (!base)
+		return;
+
+	if (used_lrs) {
+		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
+		save_lrs(vcpu, base);
+		writel_relaxed(0, base + GICH_HCR);
+	} else {
+		cpu_if->vgic_apr = 0;
+	}
+}
+
+void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *vgic = &kvm->arch.vgic;
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	void __iomem *base = vgic->vctrl_base;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	int i;
+
+	if (!base)
+		return;
+
+	if (used_lrs) {
+		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
+		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
+		for (i = 0; i < used_lrs; i++) {
+			writel_relaxed(cpu_if->vgic_lr[i],
+				       base + GICH_LR0 + (i * 4));
+		}
+	}
+}
+
 void vgic_v2_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index c7c5ef190afa..12e2a28f437e 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -749,11 +749,19 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 		vgic_clear_lr(vcpu, count);
 }
 
+static inline void vgic_save_state(struct kvm_vcpu *vcpu)
+{
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vgic_v2_save_state(vcpu);
+}
+
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 
+	vgic_save_state(vcpu);
+
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
 	/* An empty ap_list_head implies used_lrs == 0 */
@@ -765,6 +773,12 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	vgic_prune_ap_list(vcpu);
 }
 
+static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
+{
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vgic_v2_restore_state(vcpu);
+}
+
 /* Flush our emulation state into the GIC hardware before entering the guest. */
 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 {
@@ -780,13 +794,16 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * this.
 	 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
-		return;
+		goto out;
 
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+
+out:
+	vgic_restore_state(vcpu);
 }
 
 void kvm_vgic_load(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 12c37b89f7a3..89b9547fba27 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -176,6 +176,9 @@ void vgic_v2_init_lrs(void);
 void vgic_v2_load(struct kvm_vcpu *vcpu);
 void vgic_v2_put(struct kvm_vcpu *vcpu);
 
+void vgic_v2_save_state(struct kvm_vcpu *vcpu);
+void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
+
 static inline void vgic_get_irq_kref(struct vgic_irq *irq)
 {
 	if (irq->intid < VGIC_MIN_LPI)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 37/41] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We can program the GICv2 hypervisor control interface logic directly
from the core vgic code and can instead do the save/restore directly
from the flush/sync functions, which can lead to a number of future
optimizations.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/hyp/switch.c        |  4 ---
 arch/arm64/include/asm/kvm_hyp.h |  2 --
 arch/arm64/kvm/hyp/switch.c      |  4 ---
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 65 ----------------------------------------
 virt/kvm/arm/vgic/vgic-v2.c      | 63 ++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.c         | 19 +++++++++++-
 virt/kvm/arm/vgic/vgic.h         |  3 ++
 7 files changed, 84 insertions(+), 76 deletions(-)

diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 7b2bd25e3b10..214187446e63 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -91,16 +91,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_save_state(vcpu);
-	else
-		__vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
-	else
-		__vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index ffd62e31f134..16a5342c4821 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -121,8 +121,6 @@ typeof(orig) * __hyp_text fname(void)					\
 	return val;							\
 }
 
-void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 53a137821ee9..74b7d7598a51 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -196,16 +196,12 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_save_state(vcpu);
-	else
-		__vgic_v2_save_state(vcpu);
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		__vgic_v3_restore_state(vcpu);
-	else
-		__vgic_v2_restore_state(vcpu);
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index c536e3d87942..b433257f4348 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -22,71 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
-{
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 elrsr;
-	int i;
-
-	elrsr = readl_relaxed(base + GICH_ELRSR0);
-	if (unlikely(used_lrs > 32))
-		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
-
-	for (i = 0; i < used_lrs; i++) {
-		if (elrsr & (1UL << i))
-			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
-		else
-			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
-
-		writel_relaxed(0, base + GICH_LR0 + (i * 4));
-	}
-}
-
-/* vcpu is already in the HYP VA space */
-void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	struct vgic_dist *vgic = &kvm->arch.vgic;
-	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-
-	if (!base)
-		return;
-
-	if (used_lrs) {
-		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
-		save_lrs(vcpu, base);
-		writel_relaxed(0, base + GICH_HCR);
-	} else {
-		cpu_if->vgic_apr = 0;
-	}
-}
-
-/* vcpu is already in the HYP VA space */
-void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	struct vgic_dist *vgic = &kvm->arch.vgic;
-	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
-	int i;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-
-	if (!base)
-		return;
-
-	if (used_lrs) {
-		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
-		for (i = 0; i < used_lrs; i++) {
-			writel_relaxed(cpu_if->vgic_lr[i],
-				       base + GICH_LR0 + (i * 4));
-		}
-	}
-}
-
 #ifdef CONFIG_ARM64
 /*
  * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index bb305d49cfdd..1e5f3eb6973d 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -421,6 +421,69 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 	return ret;
 }
 
+static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
+{
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 elrsr;
+	int i;
+
+	elrsr = readl_relaxed(base + GICH_ELRSR0);
+	if (unlikely(used_lrs > 32))
+		elrsr |= ((u64)readl_relaxed(base + GICH_ELRSR1)) << 32;
+
+	for (i = 0; i < used_lrs; i++) {
+		if (elrsr & (1UL << i))
+			cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
+		else
+			cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
+
+		writel_relaxed(0, base + GICH_LR0 + (i * 4));
+	}
+}
+
+void vgic_v2_save_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *vgic = &kvm->arch.vgic;
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	void __iomem *base = vgic->vctrl_base;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+
+	if (!base)
+		return;
+
+	if (used_lrs) {
+		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
+		save_lrs(vcpu, base);
+		writel_relaxed(0, base + GICH_HCR);
+	} else {
+		cpu_if->vgic_apr = 0;
+	}
+}
+
+void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vgic_dist *vgic = &kvm->arch.vgic;
+	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	void __iomem *base = vgic->vctrl_base;
+	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	int i;
+
+	if (!base)
+		return;
+
+	if (used_lrs) {
+		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
+		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
+		for (i = 0; i < used_lrs; i++) {
+			writel_relaxed(cpu_if->vgic_lr[i],
+				       base + GICH_LR0 + (i * 4));
+		}
+	}
+}
+
 void vgic_v2_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index c7c5ef190afa..12e2a28f437e 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -749,11 +749,19 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 		vgic_clear_lr(vcpu, count);
 }
 
+static inline void vgic_save_state(struct kvm_vcpu *vcpu)
+{
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vgic_v2_save_state(vcpu);
+}
+
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 
+	vgic_save_state(vcpu);
+
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
 	/* An empty ap_list_head implies used_lrs == 0 */
@@ -765,6 +773,12 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	vgic_prune_ap_list(vcpu);
 }
 
+static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
+{
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vgic_v2_restore_state(vcpu);
+}
+
 /* Flush our emulation state into the GIC hardware before entering the guest. */
 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 {
@@ -780,13 +794,16 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * this.
 	 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
-		return;
+		goto out;
 
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+
+out:
+	vgic_restore_state(vcpu);
 }
 
 void kvm_vgic_load(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 12c37b89f7a3..89b9547fba27 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -176,6 +176,9 @@ void vgic_v2_init_lrs(void);
 void vgic_v2_load(struct kvm_vcpu *vcpu);
 void vgic_v2_put(struct kvm_vcpu *vcpu);
 
+void vgic_v2_save_state(struct kvm_vcpu *vcpu);
+void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
+
 static inline void vgic_get_irq_kref(struct vgic_irq *irq)
 {
 	if (irq->intid < VGIC_MIN_LPI)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 38/41] KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

The vgic-v2-sr.c file now only contains the logic to replay unaligned
accesses to the virtual CPU interface on 16K and 64K page systems, which
is only relevant on 64-bit platforms.  Therefore move this file to the
arm64 KVM tree, remove the compile directive from the 32-bit side
makefile, and remove the ifdef in the C file.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/hyp/Makefile                         | 1 -
 arch/arm64/kvm/hyp/Makefile                       | 2 +-
 {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c | 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)
 rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (98%)

diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 5638ce0c9524..1964111c984a 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -7,7 +7,6 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f04400d494b7..7e8d41210288 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -7,10 +7,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
+obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += entry.o
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
similarity index 98%
rename from virt/kvm/arm/hyp/vgic-v2-sr.c
rename to arch/arm64/kvm/hyp/vgic-v2-sr.c
index b433257f4348..fcd7b4eff927 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
@@ -22,7 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-#ifdef CONFIG_ARM64
 /*
  * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
  *				     guest.
@@ -76,4 +75,3 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
 
 	return 1;
 }
-#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 38/41] KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The vgic-v2-sr.c file now only contains the logic to replay unaligned
accesses to the virtual CPU interface on 16K and 64K page systems, which
is only relevant on 64-bit platforms.  Therefore move this file to the
arm64 KVM tree, remove the compile directive from the 32-bit side
makefile, and remove the ifdef in the C file.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/hyp/Makefile                         | 1 -
 arch/arm64/kvm/hyp/Makefile                       | 2 +-
 {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c | 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)
 rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (98%)

diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 5638ce0c9524..1964111c984a 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -7,7 +7,6 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f04400d494b7..7e8d41210288 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -7,10 +7,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
 KVM=../../../../virt/kvm
 
-obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
 
+obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += entry.o
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/arch/arm64/kvm/hyp/vgic-v2-sr.c
similarity index 98%
rename from virt/kvm/arm/hyp/vgic-v2-sr.c
rename to arch/arm64/kvm/hyp/vgic-v2-sr.c
index b433257f4348..fcd7b4eff927 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
@@ -22,7 +22,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
-#ifdef CONFIG_ARM64
 /*
  * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the
  *				     guest.
@@ -76,4 +75,3 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
 
 	return 1;
 }
-#endif
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 39/41] KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Just like we can program the GICv2 hypervisor control interface directly
from the core vgic code, we can do the same for the GICv3 hypervisor
control interface on VHE systems.

We do this by simply calling the save/restore functions when we have VHE
and we can then get rid of the save/restore function calls from the VHE
world switch function.

One caveat is that we now write GICv3 system register state before the
potential early exit path in the run loop, and because we sync back
state in the early exit path, we have to ensure that we read a
consistent GIC state from the sync path, even though we have never
actually run the guest with the newly written GIC state.  We solve this
by inserting an ISB in the early exit path.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 3 ---
 virt/kvm/arm/arm.c          | 1 +
 virt/kvm/arm/vgic/vgic.c    | 5 +++++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 74b7d7598a51..9187afca181a 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -395,8 +395,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	activate_traps_vhe(vcpu);
 	__activate_vm(vcpu->kvm);
 
-	__vgic_restore_state(vcpu);
-
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
@@ -408,7 +406,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
-	__vgic_save_state(vcpu);
 
 	deactivate_traps_vhe(vcpu);
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 6bce8f9c55db..7aad3aa43dc9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -716,6 +716,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
+			isb(); /* Ensure work in x_flush_hwstate is committed */
 			kvm_pmu_sync_hwstate(vcpu);
 			if (static_branch_unlikely(&userspace_irqchip_in_use))
 				kvm_timer_sync_hwstate(vcpu);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 12e2a28f437e..d0a19a8c196a 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -19,6 +19,7 @@
 #include <linux/list_sort.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
+#include <asm/kvm_hyp.h>
 
 #include "vgic.h"
 
@@ -753,6 +754,8 @@ static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_save_state(vcpu);
+	else if (has_vhe())
+		__vgic_v3_save_state(vcpu);
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
@@ -777,6 +780,8 @@ static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_restore_state(vcpu);
+	else if (has_vhe())
+		__vgic_v3_restore_state(vcpu);
 }
 
 /* Flush our emulation state into the GIC hardware before entering the guest. */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 39/41] KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

Just like we can program the GICv2 hypervisor control interface directly
from the core vgic code, we can do the same for the GICv3 hypervisor
control interface on VHE systems.

We do this by simply calling the save/restore functions when we have VHE
and we can then get rid of the save/restore function calls from the VHE
world switch function.

One caveat is that we now write GICv3 system register state before the
potential early exit path in the run loop, and because we sync back
state in the early exit path, we have to ensure that we read a
consistent GIC state from the sync path, even though we have never
actually run the guest with the newly written GIC state.  We solve this
by inserting an ISB in the early exit path.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/hyp/switch.c | 3 ---
 virt/kvm/arm/arm.c          | 1 +
 virt/kvm/arm/vgic/vgic.c    | 5 +++++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 74b7d7598a51..9187afca181a 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -395,8 +395,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	activate_traps_vhe(vcpu);
 	__activate_vm(vcpu->kvm);
 
-	__vgic_restore_state(vcpu);
-
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
@@ -408,7 +406,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
-	__vgic_save_state(vcpu);
 
 	deactivate_traps_vhe(vcpu);
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 6bce8f9c55db..7aad3aa43dc9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -716,6 +716,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
+			isb(); /* Ensure work in x_flush_hwstate is committed */
 			kvm_pmu_sync_hwstate(vcpu);
 			if (static_branch_unlikely(&userspace_irqchip_in_use))
 				kvm_timer_sync_hwstate(vcpu);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 12e2a28f437e..d0a19a8c196a 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -19,6 +19,7 @@
 #include <linux/list_sort.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
+#include <asm/kvm_hyp.h>
 
 #include "vgic.h"
 
@@ -753,6 +754,8 @@ static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_save_state(vcpu);
+	else if (has_vhe())
+		__vgic_v3_save_state(vcpu);
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
@@ -777,6 +780,8 @@ static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_restore_state(vcpu);
+	else if (has_vhe())
+		__vgic_v3_restore_state(vcpu);
 }
 
 /* Flush our emulation state into the GIC hardware before entering the guest. */
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 40/41] KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

The APRs can only have bits set when the guest acknowledges an interrupt
in the LR and can only have a bit cleared when the guest EOIs an
interrupt in the LR.  Therefore, if we have no LRs with any
pending/active interrupts, the APR cannot change value and there is no
need to clear it on every exit from the VM (hint: it will have already
been cleared when we exited the guest the last time with the LRs all
EOIed).

The only case we need to take care of is when we migrate the VCPU away
from a CPU or migrate a new VCPU onto a CPU, or when we return to
userspace to capture the state of the VCPU for migration.  To make sure
this works, factor out the APR save/restore functionality into separate
functions called from the VCPU (and by extension VGIC) put/load hooks.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 123 +++++++++++++++++++++------------------
 virt/kvm/arm/vgic/vgic-v2.c      |   7 +--
 virt/kvm/arm/vgic/vgic-v3.c      |   5 ++
 5 files changed, 77 insertions(+), 62 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index ab20ffa8b9e7..b3dd4f4304f5 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
 asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
 asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 16a5342c4821..693d29f0036d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 9abf2f3c12b5..811b42c8441d 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -221,14 +221,11 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		int i;
-		u32 nr_pre_bits;
 		u32 elrsr;
 
 		elrsr = read_gicreg(ICH_ELSR_EL2);
 
 		write_gicreg(0, ICH_HCR_EL2);
-		val = read_gicreg(ICH_VTR_EL2);
-		nr_pre_bits = vtr_to_nr_pre_bits(val);
 
 		for (i = 0; i < used_lrs; i++) {
 			if (elrsr & (1 << i))
@@ -238,39 +235,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 			__gic_v3_set_lr(0, i);
 		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
-			cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
-		case 6:
-			cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
-		default:
-			cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
-		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
-			cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
-		case 6:
-			cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
-		default:
-			cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
-		}
 	} else {
 		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
 		    cpu_if->its_vpe.its_vm)
 			write_gicreg(0, ICH_HCR_EL2);
-
-		cpu_if->vgic_ap0r[0] = 0;
-		cpu_if->vgic_ap0r[1] = 0;
-		cpu_if->vgic_ap0r[2] = 0;
-		cpu_if->vgic_ap0r[3] = 0;
-		cpu_if->vgic_ap1r[0] = 0;
-		cpu_if->vgic_ap1r[1] = 0;
-		cpu_if->vgic_ap1r[2] = 0;
-		cpu_if->vgic_ap1r[3] = 0;
 	}
 
 	val = read_gicreg(ICC_SRE_EL2);
@@ -287,8 +255,6 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 val;
-	u32 nr_pre_bits;
 	int i;
 
 	/*
@@ -306,32 +272,9 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
 	}
 
-	val = read_gicreg(ICH_VTR_EL2);
-	nr_pre_bits = vtr_to_nr_pre_bits(val);
-
 	if (used_lrs) {
 		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 
-		switch (nr_pre_bits) {
-		case 7:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[3], 3);
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[2], 2);
-		case 6:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[1], 1);
-		default:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[0], 0);
-		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[3], 3);
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[2], 2);
-		case 6:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[1], 1);
-		default:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[0], 0);
-		}
-
 		for (i = 0; i < used_lrs; i++)
 			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
 	} else {
@@ -364,6 +307,72 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 		     ICC_SRE_EL2);
 }
 
+void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if;
+	u64 val;
+	u32 nr_pre_bits;
+
+	vcpu = kern_hyp_va(vcpu);
+	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
+	val = read_gicreg(ICH_VTR_EL2);
+	nr_pre_bits = vtr_to_nr_pre_bits(val);
+
+	switch (nr_pre_bits) {
+	case 7:
+		cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
+		cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
+	case 6:
+		cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
+	default:
+		cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
+	}
+
+	switch (nr_pre_bits) {
+	case 7:
+		cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
+		cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
+	case 6:
+		cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
+	default:
+		cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
+	}
+}
+
+void __hyp_text __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if;
+	u64 val;
+	u32 nr_pre_bits;
+
+	vcpu = kern_hyp_va(vcpu);
+	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
+	val = read_gicreg(ICH_VTR_EL2);
+	nr_pre_bits = vtr_to_nr_pre_bits(val);
+
+	switch (nr_pre_bits) {
+	case 7:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[3], 3);
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[2], 2);
+	case 6:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[1], 1);
+	default:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[0], 0);
+	}
+
+	switch (nr_pre_bits) {
+	case 7:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[3], 3);
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[2], 2);
+	case 6:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[1], 1);
+	default:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[0], 0);
+	}
+}
+
 void __hyp_text __vgic_v3_init_lrs(void)
 {
 	int max_lr_idx = vtr_to_max_lr_idx(read_gicreg(ICH_VTR_EL2));
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 1e5f3eb6973d..ca7cfee9f353 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -446,7 +446,6 @@ void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct vgic_dist *vgic = &kvm->arch.vgic;
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
 	void __iomem *base = vgic->vctrl_base;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
 
@@ -454,11 +453,8 @@ void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 		return;
 
 	if (used_lrs) {
-		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
 		save_lrs(vcpu, base);
 		writel_relaxed(0, base + GICH_HCR);
-	} else {
-		cpu_if->vgic_apr = 0;
 	}
 }
 
@@ -476,7 +472,6 @@ void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
 		for (i = 0; i < used_lrs; i++) {
 			writel_relaxed(cpu_if->vgic_lr[i],
 				       base + GICH_LR0 + (i * 4));
@@ -490,6 +485,7 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &vcpu->kvm->arch.vgic;
 
 	writel_relaxed(cpu_if->vgic_vmcr, vgic->vctrl_base + GICH_VMCR);
+	writel_relaxed(cpu_if->vgic_apr, vgic->vctrl_base + GICH_APR);
 }
 
 void vgic_v2_put(struct kvm_vcpu *vcpu)
@@ -498,4 +494,5 @@ void vgic_v2_put(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &vcpu->kvm->arch.vgic;
 
 	cpu_if->vgic_vmcr = readl_relaxed(vgic->vctrl_base + GICH_VMCR);
+	cpu_if->vgic_apr = readl_relaxed(vgic->vctrl_base + GICH_APR);
 }
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index b76e21f3e6bd..4bafcd1e6bb8 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -16,6 +16,7 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
 
@@ -587,6 +588,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	 */
 	if (likely(cpu_if->vgic_sre))
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
+
+	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
@@ -595,4 +598,6 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
+
+	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 40/41] KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

The APRs can only have bits set when the guest acknowledges an interrupt
in the LR and can only have a bit cleared when the guest EOIs an
interrupt in the LR.  Therefore, if we have no LRs with any
pending/active interrupts, the APR cannot change value and there is no
need to clear it on every exit from the VM (hint: it will have already
been cleared when we exited the guest the last time with the LRs all
EOIed).

The only case we need to take care of is when we migrate the VCPU away
from a CPU or migrate a new VCPU onto a CPU, or when we return to
userspace to capture the state of the VCPU for migration.  To make sure
this works, factor out the APR save/restore functionality into separate
functions called from the VCPU (and by extension VGIC) put/load hooks.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 123 +++++++++++++++++++++------------------
 virt/kvm/arm/vgic/vgic-v2.c      |   7 +--
 virt/kvm/arm/vgic/vgic-v3.c      |   5 ++
 5 files changed, 77 insertions(+), 62 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index ab20ffa8b9e7..b3dd4f4304f5 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
 asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
 asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 16a5342c4821..693d29f0036d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 9abf2f3c12b5..811b42c8441d 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -221,14 +221,11 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		int i;
-		u32 nr_pre_bits;
 		u32 elrsr;
 
 		elrsr = read_gicreg(ICH_ELSR_EL2);
 
 		write_gicreg(0, ICH_HCR_EL2);
-		val = read_gicreg(ICH_VTR_EL2);
-		nr_pre_bits = vtr_to_nr_pre_bits(val);
 
 		for (i = 0; i < used_lrs; i++) {
 			if (elrsr & (1 << i))
@@ -238,39 +235,10 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 			__gic_v3_set_lr(0, i);
 		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
-			cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
-		case 6:
-			cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
-		default:
-			cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
-		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
-			cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
-		case 6:
-			cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
-		default:
-			cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
-		}
 	} else {
 		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
 		    cpu_if->its_vpe.its_vm)
 			write_gicreg(0, ICH_HCR_EL2);
-
-		cpu_if->vgic_ap0r[0] = 0;
-		cpu_if->vgic_ap0r[1] = 0;
-		cpu_if->vgic_ap0r[2] = 0;
-		cpu_if->vgic_ap0r[3] = 0;
-		cpu_if->vgic_ap1r[0] = 0;
-		cpu_if->vgic_ap1r[1] = 0;
-		cpu_if->vgic_ap1r[2] = 0;
-		cpu_if->vgic_ap1r[3] = 0;
 	}
 
 	val = read_gicreg(ICC_SRE_EL2);
@@ -287,8 +255,6 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 val;
-	u32 nr_pre_bits;
 	int i;
 
 	/*
@@ -306,32 +272,9 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
 	}
 
-	val = read_gicreg(ICH_VTR_EL2);
-	nr_pre_bits = vtr_to_nr_pre_bits(val);
-
 	if (used_lrs) {
 		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 
-		switch (nr_pre_bits) {
-		case 7:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[3], 3);
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[2], 2);
-		case 6:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[1], 1);
-		default:
-			__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[0], 0);
-		}
-
-		switch (nr_pre_bits) {
-		case 7:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[3], 3);
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[2], 2);
-		case 6:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[1], 1);
-		default:
-			__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[0], 0);
-		}
-
 		for (i = 0; i < used_lrs; i++)
 			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
 	} else {
@@ -364,6 +307,72 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 		     ICC_SRE_EL2);
 }
 
+void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if;
+	u64 val;
+	u32 nr_pre_bits;
+
+	vcpu = kern_hyp_va(vcpu);
+	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
+	val = read_gicreg(ICH_VTR_EL2);
+	nr_pre_bits = vtr_to_nr_pre_bits(val);
+
+	switch (nr_pre_bits) {
+	case 7:
+		cpu_if->vgic_ap0r[3] = __vgic_v3_read_ap0rn(3);
+		cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
+	case 6:
+		cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
+	default:
+		cpu_if->vgic_ap0r[0] = __vgic_v3_read_ap0rn(0);
+	}
+
+	switch (nr_pre_bits) {
+	case 7:
+		cpu_if->vgic_ap1r[3] = __vgic_v3_read_ap1rn(3);
+		cpu_if->vgic_ap1r[2] = __vgic_v3_read_ap1rn(2);
+	case 6:
+		cpu_if->vgic_ap1r[1] = __vgic_v3_read_ap1rn(1);
+	default:
+		cpu_if->vgic_ap1r[0] = __vgic_v3_read_ap1rn(0);
+	}
+}
+
+void __hyp_text __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if;
+	u64 val;
+	u32 nr_pre_bits;
+
+	vcpu = kern_hyp_va(vcpu);
+	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
+	val = read_gicreg(ICH_VTR_EL2);
+	nr_pre_bits = vtr_to_nr_pre_bits(val);
+
+	switch (nr_pre_bits) {
+	case 7:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[3], 3);
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[2], 2);
+	case 6:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[1], 1);
+	default:
+		__vgic_v3_write_ap0rn(cpu_if->vgic_ap0r[0], 0);
+	}
+
+	switch (nr_pre_bits) {
+	case 7:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[3], 3);
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[2], 2);
+	case 6:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[1], 1);
+	default:
+		__vgic_v3_write_ap1rn(cpu_if->vgic_ap1r[0], 0);
+	}
+}
+
 void __hyp_text __vgic_v3_init_lrs(void)
 {
 	int max_lr_idx = vtr_to_max_lr_idx(read_gicreg(ICH_VTR_EL2));
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 1e5f3eb6973d..ca7cfee9f353 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -446,7 +446,6 @@ void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct vgic_dist *vgic = &kvm->arch.vgic;
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
 	void __iomem *base = vgic->vctrl_base;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
 
@@ -454,11 +453,8 @@ void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 		return;
 
 	if (used_lrs) {
-		cpu_if->vgic_apr = readl_relaxed(base + GICH_APR);
 		save_lrs(vcpu, base);
 		writel_relaxed(0, base + GICH_HCR);
-	} else {
-		cpu_if->vgic_apr = 0;
 	}
 }
 
@@ -476,7 +472,6 @@ void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 
 	if (used_lrs) {
 		writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
-		writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
 		for (i = 0; i < used_lrs; i++) {
 			writel_relaxed(cpu_if->vgic_lr[i],
 				       base + GICH_LR0 + (i * 4));
@@ -490,6 +485,7 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &vcpu->kvm->arch.vgic;
 
 	writel_relaxed(cpu_if->vgic_vmcr, vgic->vctrl_base + GICH_VMCR);
+	writel_relaxed(cpu_if->vgic_apr, vgic->vctrl_base + GICH_APR);
 }
 
 void vgic_v2_put(struct kvm_vcpu *vcpu)
@@ -498,4 +494,5 @@ void vgic_v2_put(struct kvm_vcpu *vcpu)
 	struct vgic_dist *vgic = &vcpu->kvm->arch.vgic;
 
 	cpu_if->vgic_vmcr = readl_relaxed(vgic->vctrl_base + GICH_VMCR);
+	cpu_if->vgic_apr = readl_relaxed(vgic->vctrl_base + GICH_APR);
 }
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index b76e21f3e6bd..4bafcd1e6bb8 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -16,6 +16,7 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_asm.h>
 
@@ -587,6 +588,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	 */
 	if (likely(cpu_if->vgic_sre))
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
+
+	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
@@ -595,4 +598,6 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
+
+	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-12 12:07   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones, Christoffer Dall

We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems.  This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.

One of the challenges is that we cannot save/restore the VMCR lazily
because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
up being delivered as FIQ.

To solve this problem, and still provide fast performance in the fast
path of exiting a VM when no interrupts are pending (which also
optimized the latency for actually delivering virtual interrupts coming
from physical interrupts), we orchestrate a dance of only doing the
activate/deactivate traps in vgic load/put for VHE systems (which can
have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
configuration on every round-trip on non-VHE systems.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm/kvm/hyp/switch.c        |   8 ++-
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 arch/arm64/kvm/hyp/switch.c      |   8 ++-
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 121 +++++++++++++++++++++++++--------------
 virt/kvm/arm/vgic/vgic-v3.c      |   6 ++
 virt/kvm/arm/vgic/vgic.c         |   7 +--
 7 files changed, 103 insertions(+), 51 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index b3dd4f4304f5..d01676e5b816 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 214187446e63..337c76230885 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -89,14 +89,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
 		__vgic_v3_save_state(vcpu);
+		__vgic_v3_deactivate_traps(vcpu);
+	}
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
+		__vgic_v3_activate_traps(vcpu);
 		__vgic_v3_restore_state(vcpu);
+	}
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 693d29f0036d..af7cf0faf58f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9187afca181a..901a111fb509 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -194,14 +194,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
 		__vgic_v3_save_state(vcpu);
+		__vgic_v3_deactivate_traps(vcpu);
+	}
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
+		__vgic_v3_activate_traps(vcpu);
 		__vgic_v3_restore_state(vcpu);
+	}
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 811b42c8441d..e5f3bc7582b6 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -208,15 +208,15 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 val;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
-	 * are now visible to the system register interface.
+	 * are now visible to the system register interface when reading the
+	 * LRs, and when reading back the VMCR on non-VHE systems.
 	 */
-	if (!cpu_if->vgic_sre) {
-		dsb(st);
-		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
+	if (used_lrs || !has_vhe()) {
+		if (!cpu_if->vgic_sre)
+			dsb(st);
 	}
 
 	if (used_lrs) {
@@ -225,7 +225,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 		elrsr = read_gicreg(ICH_ELSR_EL2);
 
-		write_gicreg(0, ICH_HCR_EL2);
+		write_gicreg(cpu_if->vgic_hcr & ~ICH_HCR_EN, ICH_HCR_EL2);
 
 		for (i = 0; i < used_lrs; i++) {
 			if (elrsr & (1 << i))
@@ -235,19 +235,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 			__gic_v3_set_lr(0, i);
 		}
-	} else {
-		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
-		    cpu_if->its_vpe.its_vm)
-			write_gicreg(0, ICH_HCR_EL2);
-	}
-
-	val = read_gicreg(ICC_SRE_EL2);
-	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
-
-	if (!cpu_if->vgic_sre) {
-		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
-		isb();
-		write_gicreg(1, ICC_SRE_EL1);
 	}
 }
 
@@ -257,6 +244,31 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
 	int i;
 
+	if (used_lrs) {
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+
+		for (i = 0; i < used_lrs; i++)
+			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
+	}
+
+	/*
+	 * Ensure that writes to the LRs, and on non-VHE systems ensure that
+	 * the write to the VMCR in __vgic_v3_activate_traps(), will have
+	 * reached the (re)distributors. This ensure the guest will read the
+	 * correct values from the memory-mapped interface.
+	 */
+	if (used_lrs || !has_vhe()) {
+		if (!cpu_if->vgic_sre) {
+			isb();
+			dsb(sy);
+		}
+	}
+}
+
+void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
 	 * Group0 interrupt (as generated in GICv2 mode) to be
@@ -264,47 +276,70 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	 * consequences. So we must make sure that ICC_SRE_EL1 has
 	 * been actually programmed with the value we want before
 	 * starting to mess with the rest of the GIC, and VMCR_EL2 in
-	 * particular.
+	 * particular.  This logic must be called before
+	 * __vgic_v3_restore_state().
 	 */
 	if (!cpu_if->vgic_sre) {
 		write_gicreg(0, ICC_SRE_EL1);
 		isb();
 		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
-	}
 
-	if (used_lrs) {
-		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 
-		for (i = 0; i < used_lrs; i++)
-			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
-	} else {
-		/*
-		 * If we need to trap system registers, we must write
-		 * ICH_HCR_EL2 anyway, even if no interrupts are being
-		 * injected. Same thing if GICv4 is used, as VLPI
-		 * delivery is gated by ICH_HCR_EL2.En.
-		 */
-		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
-		    cpu_if->its_vpe.its_vm)
-			write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+		if (has_vhe()) {
+			/*
+			 * Ensure that the write to the VMCR will have reached
+			 * the (re)distributors. This ensure the guest will
+			 * read the correct values from the memory-mapped
+			 * interface.
+			 */
+			isb();
+			dsb(sy);
+		}
 	}
 
 	/*
-	 * Ensures that the above will have reached the
-	 * (re)distributors. This ensure the guest will read the
-	 * correct values from the memory-mapped interface.
+	 * Prevent the guest from touching the GIC system registers if
+	 * SRE isn't enabled for GICv3 emulation.
+	 */
+	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
+		     ICC_SRE_EL2);
+
+	/*
+	 * If we need to trap system registers, we must write
+	 * ICH_HCR_EL2 anyway, even if no interrupts are being
+	 * injected,
 	 */
+	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
+	    cpu_if->its_vpe.its_vm)
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+}
+
+void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	u64 val;
+
 	if (!cpu_if->vgic_sre) {
+		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
+	}
+
+
+	val = read_gicreg(ICC_SRE_EL2);
+	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
+
+	if (!cpu_if->vgic_sre) {
+		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
 		isb();
-		dsb(sy);
+		write_gicreg(1, ICC_SRE_EL1);
 	}
 
 	/*
-	 * Prevent the guest from touching the GIC system registers if
-	 * SRE isn't enabled for GICv3 emulation.
+	 * If we were trapping system registers, we enabled the VGIC even if
+	 * no interrupts were being injected, and we disable it again here.
 	 */
-	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
-		     ICC_SRE_EL2);
+	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
+	    cpu_if->its_vpe.its_vm)
+		write_gicreg(0, ICH_HCR_EL2);
 }
 
 void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 4bafcd1e6bb8..4200657694f0 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -590,6 +590,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
 
 	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
+
+	if (has_vhe())
+		__vgic_v3_activate_traps(vcpu);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
@@ -600,4 +603,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
 
 	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
+
+	if (has_vhe())
+		__vgic_v3_deactivate_traps(vcpu);
 }
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index d0a19a8c196a..0d95d7b55567 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -763,14 +763,14 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 
-	vgic_save_state(vcpu);
-
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
 
+	vgic_save_state(vcpu);
+
 	if (vgic_cpu->used_lrs)
 		vgic_fold_lr_state(vcpu);
 	vgic_prune_ap_list(vcpu);
@@ -799,7 +799,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * this.
 	 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
-		goto out;
+		return;
 
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
@@ -807,7 +807,6 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
 
-out:
 	vgic_restore_state(vcpu);
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
@ 2018-01-12 12:07   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-12 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems.  This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.

One of the challenges is that we cannot save/restore the VMCR lazily
because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
up being delivered as FIQ.

To solve this problem, and still provide fast performance in the fast
path of exiting a VM when no interrupts are pending (which also
optimized the latency for actually delivering virtual interrupts coming
from physical interrupts), we orchestrate a dance of only doing the
activate/deactivate traps in vgic load/put for VHE systems (which can
have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
configuration on every round-trip on non-VHE systems.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_hyp.h   |   2 +
 arch/arm/kvm/hyp/switch.c        |   8 ++-
 arch/arm64/include/asm/kvm_hyp.h |   2 +
 arch/arm64/kvm/hyp/switch.c      |   8 ++-
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 121 +++++++++++++++++++++++++--------------
 virt/kvm/arm/vgic/vgic-v3.c      |   6 ++
 virt/kvm/arm/vgic/vgic.c         |   7 +--
 7 files changed, 103 insertions(+), 51 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index b3dd4f4304f5..d01676e5b816 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 214187446e63..337c76230885 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -89,14 +89,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
 		__vgic_v3_save_state(vcpu);
+		__vgic_v3_deactivate_traps(vcpu);
+	}
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
+		__vgic_v3_activate_traps(vcpu);
 		__vgic_v3_restore_state(vcpu);
+	}
 }
 
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 693d29f0036d..af7cf0faf58f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
+void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
+void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
 void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9187afca181a..901a111fb509 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -194,14 +194,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
 		__vgic_v3_save_state(vcpu);
+		__vgic_v3_deactivate_traps(vcpu);
+	}
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
-	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
+		__vgic_v3_activate_traps(vcpu);
 		__vgic_v3_restore_state(vcpu);
+	}
 }
 
 static bool __hyp_text __true_value(void)
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 811b42c8441d..e5f3bc7582b6 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -208,15 +208,15 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
-	u64 val;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
-	 * are now visible to the system register interface.
+	 * are now visible to the system register interface when reading the
+	 * LRs, and when reading back the VMCR on non-VHE systems.
 	 */
-	if (!cpu_if->vgic_sre) {
-		dsb(st);
-		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
+	if (used_lrs || !has_vhe()) {
+		if (!cpu_if->vgic_sre)
+			dsb(st);
 	}
 
 	if (used_lrs) {
@@ -225,7 +225,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 		elrsr = read_gicreg(ICH_ELSR_EL2);
 
-		write_gicreg(0, ICH_HCR_EL2);
+		write_gicreg(cpu_if->vgic_hcr & ~ICH_HCR_EN, ICH_HCR_EL2);
 
 		for (i = 0; i < used_lrs; i++) {
 			if (elrsr & (1 << i))
@@ -235,19 +235,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 
 			__gic_v3_set_lr(0, i);
 		}
-	} else {
-		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
-		    cpu_if->its_vpe.its_vm)
-			write_gicreg(0, ICH_HCR_EL2);
-	}
-
-	val = read_gicreg(ICC_SRE_EL2);
-	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
-
-	if (!cpu_if->vgic_sre) {
-		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
-		isb();
-		write_gicreg(1, ICC_SRE_EL1);
 	}
 }
 
@@ -257,6 +244,31 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
 	int i;
 
+	if (used_lrs) {
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+
+		for (i = 0; i < used_lrs; i++)
+			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
+	}
+
+	/*
+	 * Ensure that writes to the LRs, and on non-VHE systems ensure that
+	 * the write to the VMCR in __vgic_v3_activate_traps(), will have
+	 * reached the (re)distributors. This ensure the guest will read the
+	 * correct values from the memory-mapped interface.
+	 */
+	if (used_lrs || !has_vhe()) {
+		if (!cpu_if->vgic_sre) {
+			isb();
+			dsb(sy);
+		}
+	}
+}
+
+void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
 	 * Group0 interrupt (as generated in GICv2 mode) to be
@@ -264,47 +276,70 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	 * consequences. So we must make sure that ICC_SRE_EL1 has
 	 * been actually programmed with the value we want before
 	 * starting to mess with the rest of the GIC, and VMCR_EL2 in
-	 * particular.
+	 * particular.  This logic must be called before
+	 * __vgic_v3_restore_state().
 	 */
 	if (!cpu_if->vgic_sre) {
 		write_gicreg(0, ICC_SRE_EL1);
 		isb();
 		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
-	}
 
-	if (used_lrs) {
-		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 
-		for (i = 0; i < used_lrs; i++)
-			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
-	} else {
-		/*
-		 * If we need to trap system registers, we must write
-		 * ICH_HCR_EL2 anyway, even if no interrupts are being
-		 * injected. Same thing if GICv4 is used, as VLPI
-		 * delivery is gated by ICH_HCR_EL2.En.
-		 */
-		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
-		    cpu_if->its_vpe.its_vm)
-			write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+		if (has_vhe()) {
+			/*
+			 * Ensure that the write to the VMCR will have reached
+			 * the (re)distributors. This ensure the guest will
+			 * read the correct values from the memory-mapped
+			 * interface.
+			 */
+			isb();
+			dsb(sy);
+		}
 	}
 
 	/*
-	 * Ensures that the above will have reached the
-	 * (re)distributors. This ensure the guest will read the
-	 * correct values from the memory-mapped interface.
+	 * Prevent the guest from touching the GIC system registers if
+	 * SRE isn't enabled for GICv3 emulation.
+	 */
+	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
+		     ICC_SRE_EL2);
+
+	/*
+	 * If we need to trap system registers, we must write
+	 * ICH_HCR_EL2 anyway, even if no interrupts are being
+	 * injected,
 	 */
+	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
+	    cpu_if->its_vpe.its_vm)
+		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
+}
+
+void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	u64 val;
+
 	if (!cpu_if->vgic_sre) {
+		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
+	}
+
+
+	val = read_gicreg(ICC_SRE_EL2);
+	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
+
+	if (!cpu_if->vgic_sre) {
+		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
 		isb();
-		dsb(sy);
+		write_gicreg(1, ICC_SRE_EL1);
 	}
 
 	/*
-	 * Prevent the guest from touching the GIC system registers if
-	 * SRE isn't enabled for GICv3 emulation.
+	 * If we were trapping system registers, we enabled the VGIC even if
+	 * no interrupts were being injected, and we disable it again here.
 	 */
-	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
-		     ICC_SRE_EL2);
+	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
+	    cpu_if->its_vpe.its_vm)
+		write_gicreg(0, ICH_HCR_EL2);
 }
 
 void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 4bafcd1e6bb8..4200657694f0 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -590,6 +590,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
 
 	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
+
+	if (has_vhe())
+		__vgic_v3_activate_traps(vcpu);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
@@ -600,4 +603,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
 
 	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
+
+	if (has_vhe())
+		__vgic_v3_deactivate_traps(vcpu);
 }
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index d0a19a8c196a..0d95d7b55567 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -763,14 +763,14 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 
-	vgic_save_state(vcpu);
-
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
 
+	vgic_save_state(vcpu);
+
 	if (vgic_cpu->used_lrs)
 		vgic_fold_lr_state(vcpu);
 	vgic_prune_ap_list(vcpu);
@@ -799,7 +799,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * this.
 	 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
-		goto out;
+		return;
 
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
@@ -807,7 +807,6 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
 
-out:
 	vgic_restore_state(vcpu);
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-01-15 14:14   ` Yury Norov
  -1 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-15 14:14 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Sunil Goutham, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

Hi Christoffer,

[CC Sunil Goutham <Sunil.Goutham@cavium.com>]

On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> This series redesigns parts of KVM/ARM to optimize the performance on
> VHE systems.  The general approach is to try to do as little work as
> possible when transitioning between the VM and the hypervisor.  This has
> the benefit of lower latency when waiting for interrupts and delivering
> virtual interrupts, and reduces the overhead of emulating behavior and
> I/O in the host kernel.
> 
> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> that can be generally improved.  We then add infrastructure to move more
> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> registers.
> 
> We then introduce a new world-switch function for VHE systems, which we
> can tweak and optimize for VHE systems.  To do that, we rework a lot of
> the system register save/restore handling and emulation code that may
> need access to system registers, so that we can defer as many system
> register save/restore operations to vcpu_load and vcpu_put, and move
> this logic out of the VHE world switch function.
> 
> We then optimize the configuration of traps.  On non-VHE systems, both
> the host and VM kernels run in EL1, but because the host kernel should
> have full access to the underlying hardware, but the VM kernel should
> not, we essentially make the host kernel more privileged than the VM
> kernel despite them both running at the same privilege level by enabling
> VE traps when entering the VM and disabling those traps when exiting the
> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> the hardware (as much as allowed by secure side software), and is
> unaffected by the trap configuration.  That means we can configure the
> traps for VMs running in EL1 once, and don't have to switch them on and
> off for every entry/exit to/from the VM.
> 
> Finally, we improve our VGIC handling by moving all save/restore logic
> out of the VHE world-switch, and we make it possible to truly only
> evaluate if the AP list is empty and not do *any* VGIC work if that is
> the case, and only do the minimal amount of work required in the course
> of the VGIC processing when we have virtual interrupts in flight.
> 
> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> interrupts support series [1], and the first five patches of James' SDEI
> series [2].
> 
> I've given the patches a fair amount of testing on Thunder-X, Mustang,
> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> functionality on the Foundation model, running both 64-bit VMs and
> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> GICv2-on-GICv3.
> 
> The patches are also available in the vhe-optimize-v3 branch on my
> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> prerequisites of this series.
> 
> Changes since v2:
>  - Rebased on v4.15-rc3.
>  - Includes two additional patches that only does vcpu_load after
>    kvm_vcpu_first_run_init and only for KVM_RUN.
>  - Addressed review comments from v2 (detailed changelogs are in the
>    individual patches).
> 
> Thanks,
> -Christoffer
> 
> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3

I tested this v3 series on ThunderX2 with IPI benchmark:
https://lkml.org/lkml/2017/12/11/364

I tried to address your comments in discussion to v2, like pinning
the module to specific CPU (with taskset), increasing the number of
iterations, tuning governor to max performance. Results didn't change
much, and are pretty stable.

Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
the acknowledge time is much faster for v3, so overall time to
deliver and acknowledge IPI (2nd column) is less than vanilla
4.15-rc3 kernel.

Test setup is not changed since v2: ThunderX2, 112 online CPUs,
guest is running under qemu-kvm, emulating gic version 3.

Below is test results for v1-3 normalized to host vanilla kernel
dry-run time.

Yury

Host, v4.14:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      81       110
Broadcast IPI:    0      2106

Guest, v4.14:
Dry-run:          0         1
Self-IPI:        10        18
Normal IPI:     305       525
Broadcast IPI:    0      9729

Guest, v4.14 + VHE:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     176       343
Broadcast IPI:    0      9885

And for v2.

Host, v4.15:                   
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      79       108
Broadcast IPI:    0      2102
                        
Guest, v4.15-rc:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     291       526
Broadcast IPI:    0     10439

Guest, v4.15-rc + VHE:
Dry-run:          0         2
Self-IPI:        14        28
Normal IPI:     370       569
Broadcast IPI:    0     11688

And for v3.

Host 4.15-rc3					
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	 80	  110
Broadcast IPI:	  0	 2088
		
Guest, 4.15-rc3	
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	289	  497
Broadcast IPI:	  0	 9999
		
Guest, 4.15-rc3	+ VHE
Dry-run:	  0	    2
Self-IPI:	 12	   24
Normal IPI:	347	  490
Broadcast IPI:	  0	11906

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-15 14:14   ` Yury Norov
  0 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-15 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

[CC Sunil Goutham <Sunil.Goutham@cavium.com>]

On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> This series redesigns parts of KVM/ARM to optimize the performance on
> VHE systems.  The general approach is to try to do as little work as
> possible when transitioning between the VM and the hypervisor.  This has
> the benefit of lower latency when waiting for interrupts and delivering
> virtual interrupts, and reduces the overhead of emulating behavior and
> I/O in the host kernel.
> 
> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> that can be generally improved.  We then add infrastructure to move more
> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> registers.
> 
> We then introduce a new world-switch function for VHE systems, which we
> can tweak and optimize for VHE systems.  To do that, we rework a lot of
> the system register save/restore handling and emulation code that may
> need access to system registers, so that we can defer as many system
> register save/restore operations to vcpu_load and vcpu_put, and move
> this logic out of the VHE world switch function.
> 
> We then optimize the configuration of traps.  On non-VHE systems, both
> the host and VM kernels run in EL1, but because the host kernel should
> have full access to the underlying hardware, but the VM kernel should
> not, we essentially make the host kernel more privileged than the VM
> kernel despite them both running at the same privilege level by enabling
> VE traps when entering the VM and disabling those traps when exiting the
> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> the hardware (as much as allowed by secure side software), and is
> unaffected by the trap configuration.  That means we can configure the
> traps for VMs running in EL1 once, and don't have to switch them on and
> off for every entry/exit to/from the VM.
> 
> Finally, we improve our VGIC handling by moving all save/restore logic
> out of the VHE world-switch, and we make it possible to truly only
> evaluate if the AP list is empty and not do *any* VGIC work if that is
> the case, and only do the minimal amount of work required in the course
> of the VGIC processing when we have virtual interrupts in flight.
> 
> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> interrupts support series [1], and the first five patches of James' SDEI
> series [2].
> 
> I've given the patches a fair amount of testing on Thunder-X, Mustang,
> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> functionality on the Foundation model, running both 64-bit VMs and
> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> GICv2-on-GICv3.
> 
> The patches are also available in the vhe-optimize-v3 branch on my
> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> prerequisites of this series.
> 
> Changes since v2:
>  - Rebased on v4.15-rc3.
>  - Includes two additional patches that only does vcpu_load after
>    kvm_vcpu_first_run_init and only for KVM_RUN.
>  - Addressed review comments from v2 (detailed changelogs are in the
>    individual patches).
> 
> Thanks,
> -Christoffer
> 
> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3

I tested this v3 series on ThunderX2 with IPI benchmark:
https://lkml.org/lkml/2017/12/11/364

I tried to address your comments in discussion to v2, like pinning
the module to specific CPU (with taskset), increasing the number of
iterations, tuning governor to max performance. Results didn't change
much, and are pretty stable.

Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
the acknowledge time is much faster for v3, so overall time to
deliver and acknowledge IPI (2nd column) is less than vanilla
4.15-rc3 kernel.

Test setup is not changed since v2: ThunderX2, 112 online CPUs,
guest is running under qemu-kvm, emulating gic version 3.

Below is test results for v1-3 normalized to host vanilla kernel
dry-run time.

Yury

Host, v4.14:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      81       110
Broadcast IPI:    0      2106

Guest, v4.14:
Dry-run:          0         1
Self-IPI:        10        18
Normal IPI:     305       525
Broadcast IPI:    0      9729

Guest, v4.14 + VHE:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     176       343
Broadcast IPI:    0      9885

And for v2.

Host, v4.15:                   
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      79       108
Broadcast IPI:    0      2102
                        
Guest, v4.15-rc:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     291       526
Broadcast IPI:    0     10439

Guest, v4.15-rc + VHE:
Dry-run:          0         2
Self-IPI:        14        28
Normal IPI:     370       569
Broadcast IPI:    0     11688

And for v3.

Host 4.15-rc3					
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	 80	  110
Broadcast IPI:	  0	 2088
		
Guest, 4.15-rc3	
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	289	  497
Broadcast IPI:	  0	 9999
		
Guest, 4.15-rc3	+ VHE
Dry-run:	  0	    2
Self-IPI:	 12	   24
Normal IPI:	347	  490
Broadcast IPI:	  0	11906

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-15 14:14   ` Yury Norov
@ 2018-01-15 15:50     ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-15 15:50 UTC (permalink / raw)
  To: Yury Norov
  Cc: Sunil Goutham, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

Hi Yury,

On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.

Thanks for testing this.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.

I don't see this from your results.  It looks like an IPI cost increases
from 289 to 347?

Also, acknowledging the IPI should be a constant cost (handled directly
by hardware), so that's definitely an indication something is wrong.

> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.

There must be some bug in this series, but I'm unsure where it is, as I
cannot observe it on the hardware I have at hand.

Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
series or there is some other flow around the GIC which is broken.

It would be interesting if you could measure the base exit cost using
the cycle counter from the VM to the hypervisor between the two
platforms.  That does require changing the host kernel to clear
MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
counter runs across EL2/1/0 (for example by running KVM under perf) and
running a micro test that exits using a hypercall that does nothing
(like getting the PSCI version).

I'll investigate this some more later in the week.


> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:                   
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                         
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-15 15:50     ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-15 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Yury,

On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.

Thanks for testing this.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.

I don't see this from your results.  It looks like an IPI cost increases
from 289 to 347?

Also, acknowledging the IPI should be a constant cost (handled directly
by hardware), so that's definitely an indication something is wrong.

> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.

There must be some bug in this series, but I'm unsure where it is, as I
cannot observe it on the hardware I have at hand.

Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
series or there is some other flow around the GIC which is broken.

It would be interesting if you could measure the base exit cost using
the cycle counter from the VM to the hypervisor between the two
platforms.  That does require changing the host kernel to clear
MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
counter runs across EL2/1/0 (for example by running KVM under perf) and
running a micro test that exits using a hypercall that does nothing
(like getting the PSCI version).

I'll investigate this some more later in the week.


> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:                   
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                         
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-15 15:50     ` Christoffer Dall
@ 2018-01-17  8:34       ` Yury Norov
  -1 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-17  8:34 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier, Shih-Wei Li,
	Andrew Jones, Sunil Goutham

On Mon, Jan 15, 2018 at 04:50:36PM +0100, Christoffer Dall wrote:
> Hi Yury,
> 
> On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > VHE systems.  The general approach is to try to do as little work as
> > > possible when transitioning between the VM and the hypervisor.  This has
> > > the benefit of lower latency when waiting for interrupts and delivering
> > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > I/O in the host kernel.
> > > 
> > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > that can be generally improved.  We then add infrastructure to move more
> > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > registers.
> > > 
> > > We then introduce a new world-switch function for VHE systems, which we
> > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > the system register save/restore handling and emulation code that may
> > > need access to system registers, so that we can defer as many system
> > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > this logic out of the VHE world switch function.
> > > 
> > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > the host and VM kernels run in EL1, but because the host kernel should
> > > have full access to the underlying hardware, but the VM kernel should
> > > not, we essentially make the host kernel more privileged than the VM
> > > kernel despite them both running at the same privilege level by enabling
> > > VE traps when entering the VM and disabling those traps when exiting the
> > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > the hardware (as much as allowed by secure side software), and is
> > > unaffected by the trap configuration.  That means we can configure the
> > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > off for every entry/exit to/from the VM.
> > > 
> > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > out of the VHE world-switch, and we make it possible to truly only
> > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > the case, and only do the minimal amount of work required in the course
> > > of the VGIC processing when we have virtual interrupts in flight.
> > > 
> > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > interrupts support series [1], and the first five patches of James' SDEI
> > > series [2].
> > > 
> > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > functionality on the Foundation model, running both 64-bit VMs and
> > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > GICv2-on-GICv3.
> > > 
> > > The patches are also available in the vhe-optimize-v3 branch on my
> > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > prerequisites of this series.
> > > 
> > > Changes since v2:
> > >  - Rebased on v4.15-rc3.
> > >  - Includes two additional patches that only does vcpu_load after
> > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > >  - Addressed review comments from v2 (detailed changelogs are in the
> > >    individual patches).
> > > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > I tested this v3 series on ThunderX2 with IPI benchmark:
> > https://lkml.org/lkml/2017/12/11/364
> > 
> > I tried to address your comments in discussion to v2, like pinning
> > the module to specific CPU (with taskset), increasing the number of
> > iterations, tuning governor to max performance. Results didn't change
> > much, and are pretty stable.
> 
> Thanks for testing this.
> > 
> > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > the acknowledge time is much faster for v3, so overall time to
> > deliver and acknowledge IPI (2nd column) is less than vanilla
> > 4.15-rc3 kernel.
> 
> I don't see this from your results.  It looks like an IPI cost increases
> from 289 to 347?

I mean turnaround time - 497 without your patches and 490 with them.

> Also, acknowledging the IPI should be a constant cost (handled directly
> by hardware), so that's definitely an indication something is wrong.
> 
> > 
> > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > guest is running under qemu-kvm, emulating gic version 3.
> > 
> > Below is test results for v1-3 normalized to host vanilla kernel
> > dry-run time.
> 
> There must be some bug in this series, but I'm unsure where it is, as I
> cannot observe it on the hardware I have at hand.
> 
> Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
> series or there is some other flow around the GIC which is broken.
> 
> It would be interesting if you could measure the base exit cost using
> the cycle counter from the VM to the hypervisor between the two
> platforms.  That does require changing the host kernel to clear
> MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
> counter runs across EL2/1/0 (for example by running KVM under perf) and
> running a micro test that exits using a hypercall that does nothing
> (like getting the PSCI version).


I can do this, later this week, OK?

Yury
 
> I'll investigate this some more later in the week.
> 
> 
> > 
> > Yury
> > 
> > Host, v4.14:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      81       110
> > Broadcast IPI:    0      2106
> > 
> > Guest, v4.14:
> > Dry-run:          0         1
> > Self-IPI:        10        18
> > Normal IPI:     305       525
> > Broadcast IPI:    0      9729
> > 
> > Guest, v4.14 + VHE:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     176       343
> > Broadcast IPI:    0      9885
> > 
> > And for v2.
> > 
> > Host, v4.15:                   
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      79       108
> > Broadcast IPI:    0      2102
> >                         
> > Guest, v4.15-rc:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     291       526
> > Broadcast IPI:    0     10439
> > 
> > Guest, v4.15-rc + VHE:
> > Dry-run:          0         2
> > Self-IPI:        14        28
> > Normal IPI:     370       569
> > Broadcast IPI:    0     11688
> > 
> > And for v3.
> > 
> > Host 4.15-rc3					
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	 80	  110
> > Broadcast IPI:	  0	 2088
> > 		
> > Guest, 4.15-rc3	
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	289	  497
> > Broadcast IPI:	  0	 9999
> > 		
> > Guest, 4.15-rc3	+ VHE
> > Dry-run:	  0	    2
> > Self-IPI:	 12	   24
> > Normal IPI:	347	  490
> > Broadcast IPI:	  0	11906
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-17  8:34       ` Yury Norov
  0 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-17  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 15, 2018 at 04:50:36PM +0100, Christoffer Dall wrote:
> Hi Yury,
> 
> On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > VHE systems.  The general approach is to try to do as little work as
> > > possible when transitioning between the VM and the hypervisor.  This has
> > > the benefit of lower latency when waiting for interrupts and delivering
> > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > I/O in the host kernel.
> > > 
> > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > that can be generally improved.  We then add infrastructure to move more
> > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > registers.
> > > 
> > > We then introduce a new world-switch function for VHE systems, which we
> > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > the system register save/restore handling and emulation code that may
> > > need access to system registers, so that we can defer as many system
> > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > this logic out of the VHE world switch function.
> > > 
> > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > the host and VM kernels run in EL1, but because the host kernel should
> > > have full access to the underlying hardware, but the VM kernel should
> > > not, we essentially make the host kernel more privileged than the VM
> > > kernel despite them both running at the same privilege level by enabling
> > > VE traps when entering the VM and disabling those traps when exiting the
> > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > the hardware (as much as allowed by secure side software), and is
> > > unaffected by the trap configuration.  That means we can configure the
> > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > off for every entry/exit to/from the VM.
> > > 
> > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > out of the VHE world-switch, and we make it possible to truly only
> > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > the case, and only do the minimal amount of work required in the course
> > > of the VGIC processing when we have virtual interrupts in flight.
> > > 
> > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > interrupts support series [1], and the first five patches of James' SDEI
> > > series [2].
> > > 
> > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > functionality on the Foundation model, running both 64-bit VMs and
> > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > GICv2-on-GICv3.
> > > 
> > > The patches are also available in the vhe-optimize-v3 branch on my
> > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > prerequisites of this series.
> > > 
> > > Changes since v2:
> > >  - Rebased on v4.15-rc3.
> > >  - Includes two additional patches that only does vcpu_load after
> > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > >  - Addressed review comments from v2 (detailed changelogs are in the
> > >    individual patches).
> > > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > I tested this v3 series on ThunderX2 with IPI benchmark:
> > https://lkml.org/lkml/2017/12/11/364
> > 
> > I tried to address your comments in discussion to v2, like pinning
> > the module to specific CPU (with taskset), increasing the number of
> > iterations, tuning governor to max performance. Results didn't change
> > much, and are pretty stable.
> 
> Thanks for testing this.
> > 
> > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > the acknowledge time is much faster for v3, so overall time to
> > deliver and acknowledge IPI (2nd column) is less than vanilla
> > 4.15-rc3 kernel.
> 
> I don't see this from your results.  It looks like an IPI cost increases
> from 289 to 347?

I mean turnaround time - 497 without your patches and 490 with them.

> Also, acknowledging the IPI should be a constant cost (handled directly
> by hardware), so that's definitely an indication something is wrong.
> 
> > 
> > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > guest is running under qemu-kvm, emulating gic version 3.
> > 
> > Below is test results for v1-3 normalized to host vanilla kernel
> > dry-run time.
> 
> There must be some bug in this series, but I'm unsure where it is, as I
> cannot observe it on the hardware I have at hand.
> 
> Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
> series or there is some other flow around the GIC which is broken.
> 
> It would be interesting if you could measure the base exit cost using
> the cycle counter from the VM to the hypervisor between the two
> platforms.  That does require changing the host kernel to clear
> MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
> counter runs across EL2/1/0 (for example by running KVM under perf) and
> running a micro test that exits using a hypercall that does nothing
> (like getting the PSCI version).


I can do this, later this week, OK?

Yury
 
> I'll investigate this some more later in the week.
> 
> 
> > 
> > Yury
> > 
> > Host, v4.14:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      81       110
> > Broadcast IPI:    0      2106
> > 
> > Guest, v4.14:
> > Dry-run:          0         1
> > Self-IPI:        10        18
> > Normal IPI:     305       525
> > Broadcast IPI:    0      9729
> > 
> > Guest, v4.14 + VHE:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     176       343
> > Broadcast IPI:    0      9885
> > 
> > And for v2.
> > 
> > Host, v4.15:                   
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      79       108
> > Broadcast IPI:    0      2102
> >                         
> > Guest, v4.15-rc:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     291       526
> > Broadcast IPI:    0     10439
> > 
> > Guest, v4.15-rc + VHE:
> > Dry-run:          0         2
> > Self-IPI:        14        28
> > Normal IPI:     370       569
> > Broadcast IPI:    0     11688
> > 
> > And for v3.
> > 
> > Host 4.15-rc3					
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	 80	  110
> > Broadcast IPI:	  0	 2088
> > 		
> > Guest, 4.15-rc3	
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	289	  497
> > Broadcast IPI:	  0	 9999
> > 		
> > Guest, 4.15-rc3	+ VHE
> > Dry-run:	  0	    2
> > Self-IPI:	 12	   24
> > Normal IPI:	347	  490
> > Broadcast IPI:	  0	11906
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-17  8:34       ` Yury Norov
@ 2018-01-17 10:48         ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-17 10:48 UTC (permalink / raw)
  To: Yury Norov
  Cc: Sunil Goutham, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Wed, Jan 17, 2018 at 11:34:54AM +0300, Yury Norov wrote:
> On Mon, Jan 15, 2018 at 04:50:36PM +0100, Christoffer Dall wrote:
> > Hi Yury,
> > 
> > On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > > VHE systems.  The general approach is to try to do as little work as
> > > > possible when transitioning between the VM and the hypervisor.  This has
> > > > the benefit of lower latency when waiting for interrupts and delivering
> > > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > > I/O in the host kernel.
> > > > 
> > > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > > that can be generally improved.  We then add infrastructure to move more
> > > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > > registers.
> > > > 
> > > > We then introduce a new world-switch function for VHE systems, which we
> > > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > > the system register save/restore handling and emulation code that may
> > > > need access to system registers, so that we can defer as many system
> > > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > > this logic out of the VHE world switch function.
> > > > 
> > > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > > the host and VM kernels run in EL1, but because the host kernel should
> > > > have full access to the underlying hardware, but the VM kernel should
> > > > not, we essentially make the host kernel more privileged than the VM
> > > > kernel despite them both running at the same privilege level by enabling
> > > > VE traps when entering the VM and disabling those traps when exiting the
> > > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > > the hardware (as much as allowed by secure side software), and is
> > > > unaffected by the trap configuration.  That means we can configure the
> > > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > > off for every entry/exit to/from the VM.
> > > > 
> > > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > > out of the VHE world-switch, and we make it possible to truly only
> > > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > > the case, and only do the minimal amount of work required in the course
> > > > of the VGIC processing when we have virtual interrupts in flight.
> > > > 
> > > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > > interrupts support series [1], and the first five patches of James' SDEI
> > > > series [2].
> > > > 
> > > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > > functionality on the Foundation model, running both 64-bit VMs and
> > > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > > GICv2-on-GICv3.
> > > > 
> > > > The patches are also available in the vhe-optimize-v3 branch on my
> > > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > > prerequisites of this series.
> > > > 
> > > > Changes since v2:
> > > >  - Rebased on v4.15-rc3.
> > > >  - Includes two additional patches that only does vcpu_load after
> > > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > > >  - Addressed review comments from v2 (detailed changelogs are in the
> > > >    individual patches).
> > > > 
> > > > Thanks,
> > > > -Christoffer
> > > > 
> > > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > > 
> > > I tested this v3 series on ThunderX2 with IPI benchmark:
> > > https://lkml.org/lkml/2017/12/11/364
> > > 
> > > I tried to address your comments in discussion to v2, like pinning
> > > the module to specific CPU (with taskset), increasing the number of
> > > iterations, tuning governor to max performance. Results didn't change
> > > much, and are pretty stable.
> > 
> > Thanks for testing this.
> > > 
> > > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > > the acknowledge time is much faster for v3, so overall time to
> > > deliver and acknowledge IPI (2nd column) is less than vanilla
> > > 4.15-rc3 kernel.
> > 
> > I don't see this from your results.  It looks like an IPI cost increases
> > from 289 to 347?
> 
> I mean turnaround time - 497 without your patches and 490 with them.
> 

I have a hard time making sense of this; that would either indicate that
before we had something that was slow in your IPI workload, either in
a loop that shouldn't trap or from ktime_get() --- prior to it actually
returning the time --- which has now become faster (much faster given
the increase in send/receive IPI time), or the timers have become messed
up and show inconsistent time counts across the sending and receiving
CPUs.  Hmm.

> > Also, acknowledging the IPI should be a constant cost (handled directly
> > by hardware), so that's definitely an indication something is wrong.
> > 
> > > 
> > > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > > guest is running under qemu-kvm, emulating gic version 3.
> > > 
> > > Below is test results for v1-3 normalized to host vanilla kernel
> > > dry-run time.
> > 
> > There must be some bug in this series, but I'm unsure where it is, as I
> > cannot observe it on the hardware I have at hand.
> > 
> > Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
> > series or there is some other flow around the GIC which is broken.
> > 
> > It would be interesting if you could measure the base exit cost using
> > the cycle counter from the VM to the hypervisor between the two
> > platforms.  That does require changing the host kernel to clear
> > MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
> > counter runs across EL2/1/0 (for example by running KVM under perf) and
> > running a micro test that exits using a hypercall that does nothing
> > (like getting the PSCI version).
> 
> 
> I can do this, later this week, OK?
> 

That would be helpful indeed.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-17 10:48         ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-17 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 11:34:54AM +0300, Yury Norov wrote:
> On Mon, Jan 15, 2018 at 04:50:36PM +0100, Christoffer Dall wrote:
> > Hi Yury,
> > 
> > On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > > VHE systems.  The general approach is to try to do as little work as
> > > > possible when transitioning between the VM and the hypervisor.  This has
> > > > the benefit of lower latency when waiting for interrupts and delivering
> > > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > > I/O in the host kernel.
> > > > 
> > > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > > that can be generally improved.  We then add infrastructure to move more
> > > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > > registers.
> > > > 
> > > > We then introduce a new world-switch function for VHE systems, which we
> > > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > > the system register save/restore handling and emulation code that may
> > > > need access to system registers, so that we can defer as many system
> > > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > > this logic out of the VHE world switch function.
> > > > 
> > > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > > the host and VM kernels run in EL1, but because the host kernel should
> > > > have full access to the underlying hardware, but the VM kernel should
> > > > not, we essentially make the host kernel more privileged than the VM
> > > > kernel despite them both running at the same privilege level by enabling
> > > > VE traps when entering the VM and disabling those traps when exiting the
> > > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > > the hardware (as much as allowed by secure side software), and is
> > > > unaffected by the trap configuration.  That means we can configure the
> > > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > > off for every entry/exit to/from the VM.
> > > > 
> > > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > > out of the VHE world-switch, and we make it possible to truly only
> > > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > > the case, and only do the minimal amount of work required in the course
> > > > of the VGIC processing when we have virtual interrupts in flight.
> > > > 
> > > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > > interrupts support series [1], and the first five patches of James' SDEI
> > > > series [2].
> > > > 
> > > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > > functionality on the Foundation model, running both 64-bit VMs and
> > > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > > GICv2-on-GICv3.
> > > > 
> > > > The patches are also available in the vhe-optimize-v3 branch on my
> > > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > > prerequisites of this series.
> > > > 
> > > > Changes since v2:
> > > >  - Rebased on v4.15-rc3.
> > > >  - Includes two additional patches that only does vcpu_load after
> > > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > > >  - Addressed review comments from v2 (detailed changelogs are in the
> > > >    individual patches).
> > > > 
> > > > Thanks,
> > > > -Christoffer
> > > > 
> > > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > > 
> > > I tested this v3 series on ThunderX2 with IPI benchmark:
> > > https://lkml.org/lkml/2017/12/11/364
> > > 
> > > I tried to address your comments in discussion to v2, like pinning
> > > the module to specific CPU (with taskset), increasing the number of
> > > iterations, tuning governor to max performance. Results didn't change
> > > much, and are pretty stable.
> > 
> > Thanks for testing this.
> > > 
> > > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > > the acknowledge time is much faster for v3, so overall time to
> > > deliver and acknowledge IPI (2nd column) is less than vanilla
> > > 4.15-rc3 kernel.
> > 
> > I don't see this from your results.  It looks like an IPI cost increases
> > from 289 to 347?
> 
> I mean turnaround time - 497 without your patches and 490 with them.
> 

I have a hard time making sense of this; that would either indicate that
before we had something that was slow in your IPI workload, either in
a loop that shouldn't trap or from ktime_get() --- prior to it actually
returning the time --- which has now become faster (much faster given
the increase in send/receive IPI time), or the timers have become messed
up and show inconsistent time counts across the sending and receiving
CPUs.  Hmm.

> > Also, acknowledging the IPI should be a constant cost (handled directly
> > by hardware), so that's definitely an indication something is wrong.
> > 
> > > 
> > > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > > guest is running under qemu-kvm, emulating gic version 3.
> > > 
> > > Below is test results for v1-3 normalized to host vanilla kernel
> > > dry-run time.
> > 
> > There must be some bug in this series, but I'm unsure where it is, as I
> > cannot observe it on the hardware I have at hand.
> > 
> > Perhaps we mistakenly enable the GICv3 CPU interface trapping with this
> > series or there is some other flow around the GIC which is broken.
> > 
> > It would be interesting if you could measure the base exit cost using
> > the cycle counter from the VM to the hypervisor between the two
> > platforms.  That does require changing the host kernel to clear
> > MDCR_EL2.TPM when running a guest (unsafe), and ensuring the cycle
> > counter runs across EL2/1/0 (for example by running KVM under perf) and
> > running a micro test that exits using a hypercall that does nothing
> > (like getting the PSCI version).
> 
> 
> I can do this, later this week, OK?
> 

That would be helpful indeed.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-17 14:44     ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 14:44 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Andrew Jones, Shih-Wei Li, kvm

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
>   arch/arm64/kvm/hyp/switch.c          | 8 ++------
>   arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
>   arch/arm64/kvm/inject_fault.c        | 6 +++---
>   4 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index b36aaa1fe332..e07bf463ac58 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
>   void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>   void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>   
> +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	return !(vcpu->arch.hcr_el2 & HCR_RW);
> +}
> +

Just so I understand, the difference between this and vcpu_mode_is_32bit 
is that vcpu_mode_is_32bit might return true because an 
interrupt/exception occured while guest was executing 32bit EL0 but 
guest EL1 is still 64bits, is that correct?

Also, it seems the process controlling KVM is supposed to provide the 
information of whether the vcpu runs a 32bit el1, would it be better to do:

	return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);

instead of looking at the hcr? Or is there a case where those might differ?

Otherwise:

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
@ 2018-01-17 14:44     ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
>   arch/arm64/kvm/hyp/switch.c          | 8 ++------
>   arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
>   arch/arm64/kvm/inject_fault.c        | 6 +++---
>   4 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index b36aaa1fe332..e07bf463ac58 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
>   void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>   void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>   
> +static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> +{
> +	return !(vcpu->arch.hcr_el2 & HCR_RW);
> +}
> +

Just so I understand, the difference between this and vcpu_mode_is_32bit 
is that vcpu_mode_is_32bit might return true because an 
interrupt/exception occured while guest was executing 32bit EL0 but 
guest EL1 is still 64bits, is that correct?

Also, it seems the process controlling KVM is supposed to provide the 
information of whether the vcpu runs a 32bit el1, would it be better to do:

	return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);

instead of looking at the hcr? Or is there a case where those might differ?

Otherwise:

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-17 15:11     ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 15:11 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Andrew Jones, Shih-Wei Li, kvm



On 12/01/18 12:07, Christoffer Dall wrote:
> There is no need to figure out inside the world-switch if we should
> save/restore the debug registers or not, we can might as well do that in

Nit: -can*

> the higher level debug setup code, making it easier to optimize down the
> line.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/kvm/debug.c        | 5 +++++
>   arch/arm64/kvm/hyp/debug-sr.c | 6 ------
>   2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index fa63b28c65e0..feedb877cff8 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -193,6 +193,11 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>   	if (trap_debug)
>   		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>   
> +	/* If KDE or MDE are set, perform a full save/restore cycle. */
> +	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
> +	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
> +		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +
>   	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>   	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
>   }
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 321c9c05dd9e..406829b6a43e 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -162,12 +162,6 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
>   
>   void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
>   {
> -	/* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
> -	 * a full save/restore cycle. */
> -	if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
> -	    (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> -
>   	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
>   			   kern_hyp_va(vcpu->arch.host_cpu_context));
>   	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch
@ 2018-01-17 15:11     ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 15:11 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/01/18 12:07, Christoffer Dall wrote:
> There is no need to figure out inside the world-switch if we should
> save/restore the debug registers or not, we can might as well do that in

Nit: -can*

> the higher level debug setup code, making it easier to optimize down the
> line.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/kvm/debug.c        | 5 +++++
>   arch/arm64/kvm/hyp/debug-sr.c | 6 ------
>   2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index fa63b28c65e0..feedb877cff8 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -193,6 +193,11 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>   	if (trap_debug)
>   		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>   
> +	/* If KDE or MDE are set, perform a full save/restore cycle. */
> +	if ((vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_KDE) ||
> +	    (vcpu_sys_reg(vcpu, MDSCR_EL1) & DBG_MDSCR_MDE))
> +		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +
>   	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>   	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_sys_reg(vcpu, MDSCR_EL1));
>   }
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 321c9c05dd9e..406829b6a43e 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -162,12 +162,6 @@ void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
>   
>   void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
>   {
> -	/* If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY is set, perform
> -	 * a full save/restore cycle. */
> -	if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
> -	    (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_MDE))
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> -
>   	__debug_save_state(vcpu, &vcpu->arch.host_debug_state.regs,
>   			   kern_hyp_va(vcpu->arch.host_cpu_context));
>   	__debug_save_spe()(&vcpu->arch.host_debug_state.pmscr_el1);
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds
  2018-01-12 12:07   ` Christoffer Dall
  (?)
@ 2018-01-17 15:35   ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 15:35 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/01/18 12:07, Christoffer Dall wrote:
> The current world-switch function has functionality to detect a number
> of cases where we need to fixup some part of the exit condition and
> possibly run the guest again, before having restored the host state.
> 
> This includes populating missing fault info, emulating GICv2 CPU
> interface accesses when mapped at unaligned addresses, and emulating
> the GICv3 CPU interface on systems that need it.
> 
> As we are about to have an alternative switch function for VHE systems,
> but VHE systems still need the same early fixup logic, factor out this
> logic into a separate function that can be shared by both switch
> functions.
> 
> No functional change.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/kvm/hyp/switch.c | 99 ++++++++++++++++++++++++---------------------
>   1 file changed, 54 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 63284647ed11..55ca2e3d42eb 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -270,50 +270,24 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +/*
> + * Return true when we were able to fixup the guest exit and should return to
> + * the guest, false when we should restore the host state and return to the
> + * main run loop.
> + */
> +static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   {
> -	struct kvm_cpu_context *host_ctxt;
> -	struct kvm_cpu_context *guest_ctxt;
> -	u64 exit_code;
> -
> -	vcpu = kern_hyp_va(vcpu);
> -
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	host_ctxt->__hyp_running_vcpu = vcpu;
> -	guest_ctxt = &vcpu->arch.ctxt;
> -
> -	__sysreg_save_host_state(host_ctxt);
> -
> -	__activate_traps(vcpu);
> -	__activate_vm(vcpu);
> -
> -	__vgic_restore_state(vcpu);
> -	__timer_enable_traps(vcpu);
> -
> -	/*
> -	 * We must restore the 32-bit state before the sysregs, thanks
> -	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> -	 */
> -	__sysreg32_restore_state(vcpu);
> -	__sysreg_restore_guest_state(guest_ctxt);
> -	__debug_switch_to_guest(vcpu);
> -
> -	/* Jump in the fire! */
> -again:
> -	exit_code = __guest_enter(vcpu, host_ctxt);
> -	/* And we're baaack! */
> -
>   	/*
>   	 * We're using the raw exception code in order to only process
>   	 * the trap if no SError is pending. We will come back to the
>   	 * same PC once the SError has been injected, and replay the
>   	 * trapping instruction.
>   	 */
> -	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> -		goto again;
> +	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> +		return true;
>   
>   	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
> -	    exit_code == ARM_EXCEPTION_TRAP) {
> +	    *exit_code == ARM_EXCEPTION_TRAP) {
>   		bool valid;
>   
>   		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
> @@ -327,9 +301,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   			if (ret == 1) {
>   				if (__skip_instr(vcpu))
> -					goto again;
> +					return true;
>   				else
> -					exit_code = ARM_EXCEPTION_TRAP;
> +					*exit_code = ARM_EXCEPTION_TRAP;
>   			}
>   
>   			if (ret == -1) {
> @@ -341,29 +315,64 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   				 */
>   				if (!__skip_instr(vcpu))
>   					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
> -				exit_code = ARM_EXCEPTION_EL1_SERROR;
> +				*exit_code = ARM_EXCEPTION_EL1_SERROR;
>   			}
> -
> -			/* 0 falls through to be handler out of EL2 */
>   		}
>   	}
>   
>   	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
> -	    exit_code == ARM_EXCEPTION_TRAP &&
> +	    *exit_code == ARM_EXCEPTION_TRAP &&
>   	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
>   	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>   		int ret = __vgic_v3_perform_cpuif_access(vcpu);
>   
>   		if (ret == 1) {
>   			if (__skip_instr(vcpu))
> -				goto again;
> +				return true;
>   			else
> -				exit_code = ARM_EXCEPTION_TRAP;
> +				*exit_code = ARM_EXCEPTION_TRAP;
>   		}
> -
> -		/* 0 falls through to be handled out of EL2 */
>   	}
>   
> +	/* Return to the host kernel and handle the exit */
> +	return false;
> +}
> +
> +int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *host_ctxt;
> +	struct kvm_cpu_context *guest_ctxt;
> +	u64 exit_code;
> +
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt->__hyp_running_vcpu = vcpu;
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__sysreg_save_host_state(host_ctxt);
> +
> +	__activate_traps(vcpu);
> +	__activate_vm(vcpu);
> +
> +	__vgic_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
> +
> +	/*
> +	 * We must restore the 32-bit state before the sysregs, thanks
> +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +	 */
> +	__sysreg32_restore_state(vcpu);
> +	__sysreg_restore_guest_state(guest_ctxt);
> +	__debug_switch_to_guest(vcpu);
> +
> +	do {
> +		/* Jump in the fire! */
> +		exit_code = __guest_enter(vcpu, host_ctxt);
> +
> +		/* And we're baaack! */
> +	} while (fixup_guest_exit(vcpu, &exit_code));
> +
>   	__sysreg_save_guest_state(guest_ctxt);
>   	__sysreg32_save_state(vcpu);
>   	__timer_disable_traps(vcpu);
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-17 17:52     ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 17:52 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm, Shih-Wei Li



On 12/01/18 12:07, Christoffer Dall wrote:
> We are about to defer saving and restoring some groups of system
> registers to vcpu_put and vcpu_load on supported systems.  This means
> that we need some infrastructure to access system registes which
> supports either accessing the memory backing of the register or directly
> accessing the system registers, depending on the state of the system
> when we access the register.
> 
> We do this by defining a set of read/write accessors for each system
> register, and letting each system register be defined as "immediate" or
> "deferrable".  Immediate registers are always saved/restored in the
> world-switch path, but deferrable registers are only saved/restored in
> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> in that case.
> 

The patch is fine, however I'd suggest adding a comment in the pointing 
out that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu 
struct. Instinctively I would expect the deferrable/immediate to apply 
to the actual hardware register access, so a comment would prevent 
people like me to get on the wrong track.

> Not that we don't use the deferred mechanism yet in this patch, but only
> introduce infrastructure.  This is to improve convenience of review in
> the subsequent patches where it is clear which registers become
> deferred.
> 
>   [ Most of this logic was contributed by Marc Zyngier ]
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_host.h |   8 +-
>   arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 166 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 91272c35cc36..4b5ef82f6bdb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>   
>   	/* Detect first run of a vcpu */
>   	bool has_run_once;
> +
> +	/* True when deferrable sysregs are loaded on the physical CPU,
> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> +	bool sysregs_loaded_on_cpu;
>   };
>   
>   #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>    */
>   #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>   
> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>   
>   /*
>    * CP14 and CP15 live in the same array, as they are backed by the
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 96398d53b462..9d353a6a55c9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -35,6 +35,7 @@
>   #include <asm/kvm_coproc.h>
>   #include <asm/kvm_emulate.h>
>   #include <asm/kvm_host.h>
> +#include <asm/kvm_hyp.h>
>   #include <asm/kvm_mmu.h>
>   #include <asm/perf_event.h>
>   #include <asm/sysreg.h>
> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>   	return false;
>   }
>   
> +struct sys_reg_accessor {
> +	u64	(*rdsr)(struct kvm_vcpu *, int);
> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);

Nit:

Why use a signed integer for the register index argument?

> +};
> +
> +#define DECLARE_IMMEDIATE_SR(i)						\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		__vcpu_sys_reg(vcpu, r) = v;				\
> +	}								\
> +
> +#define DECLARE_DEFERRABLE_SR(i, s)					\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			return read_sysreg_s((s));			\
> +		}							\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			write_sysreg_s(v, (s));				\
> +		} else {						\
> +			__vcpu_sys_reg(vcpu, r) = v;			\
> +		}							\
> +	}								\
> +
> +
> +#define SR_HANDLER_RANGE(i,e)						\
> +	[i ... e] =  (struct sys_reg_accessor) {			\
> +		.rdsr = __##i##_read,					\
> +		.wrsr = __##i##_write,					\

Nit:
Could we have __vcpu_##i##_read and __vcpu_##i##_write?

> +	}
> +
> +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> +
> +static void bad_sys_reg(int reg)
> +{
> +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> +}
> +
> +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	bad_sys_reg(reg);
> +	return 0;
> +}
> +
> +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	bad_sys_reg(reg);
> +}
> +
> +/* Ordered as in enum vcpu_sysreg */
> +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> +DECLARE_IMMEDIATE_SR(TCR_EL1);
> +DECLARE_IMMEDIATE_SR(ESR_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> +DECLARE_IMMEDIATE_SR(FAR_EL1);
> +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> +DECLARE_IMMEDIATE_SR(PAR_EL1);
> +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> +/* PMEVCNTR30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> +/* PMEVTYPER30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> +
> +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> +	[0 ... NR_SYS_REGS - 1] = {
> +		.rdsr = __default_read_sys_reg,
> +		.wrsr = __default_write_sys_reg,
> +	},
> +
> +	SR_HANDLER(MPIDR_EL1),
> +	SR_HANDLER(CSSELR_EL1),
> +	SR_HANDLER(SCTLR_EL1),
> +	SR_HANDLER(ACTLR_EL1),
> +	SR_HANDLER(CPACR_EL1),
> +	SR_HANDLER(TTBR0_EL1),
> +	SR_HANDLER(TTBR1_EL1),
> +	SR_HANDLER(TCR_EL1),
> +	SR_HANDLER(ESR_EL1),
> +	SR_HANDLER(AFSR0_EL1),
> +	SR_HANDLER(AFSR1_EL1),
> +	SR_HANDLER(FAR_EL1),
> +	SR_HANDLER(MAIR_EL1),
> +	SR_HANDLER(VBAR_EL1),
> +	SR_HANDLER(CONTEXTIDR_EL1),
> +	SR_HANDLER(TPIDR_EL0),
> +	SR_HANDLER(TPIDRRO_EL0),
> +	SR_HANDLER(TPIDR_EL1),
> +	SR_HANDLER(AMAIR_EL1),
> +	SR_HANDLER(CNTKCTL_EL1),
> +	SR_HANDLER(PAR_EL1),
> +	SR_HANDLER(MDSCR_EL1),
> +	SR_HANDLER(MDCCINT_EL1),
> +	SR_HANDLER(PMCR_EL0),
> +	SR_HANDLER(PMSELR_EL0),
> +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> +	SR_HANDLER(PMCCNTR_EL0),
> +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> +	SR_HANDLER(PMCCFILTR_EL0),
> +	SR_HANDLER(PMCNTENSET_EL0),
> +	SR_HANDLER(PMINTENSET_EL1),
> +	SR_HANDLER(PMOVSSET_EL0),
> +	SR_HANDLER(PMSWINC_EL0),
> +	SR_HANDLER(PMUSERENR_EL0),
> +	SR_HANDLER(DACR32_EL2),
> +	SR_HANDLER(IFSR32_EL2),
> +	SR_HANDLER(FPEXC32_EL2),
> +	SR_HANDLER(DBGVCR32_EL2),
> +};
> +
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> +}
> +
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> +}
> +
>   /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
>   static u32 cache_levels;
>   
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-17 17:52     ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 17:52 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/01/18 12:07, Christoffer Dall wrote:
> We are about to defer saving and restoring some groups of system
> registers to vcpu_put and vcpu_load on supported systems.  This means
> that we need some infrastructure to access system registes which
> supports either accessing the memory backing of the register or directly
> accessing the system registers, depending on the state of the system
> when we access the register.
> 
> We do this by defining a set of read/write accessors for each system
> register, and letting each system register be defined as "immediate" or
> "deferrable".  Immediate registers are always saved/restored in the
> world-switch path, but deferrable registers are only saved/restored in
> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> in that case.
> 

The patch is fine, however I'd suggest adding a comment in the pointing 
out that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu 
struct. Instinctively I would expect the deferrable/immediate to apply 
to the actual hardware register access, so a comment would prevent 
people like me to get on the wrong track.

> Not that we don't use the deferred mechanism yet in this patch, but only
> introduce infrastructure.  This is to improve convenience of review in
> the subsequent patches where it is clear which registers become
> deferred.
> 
>   [ Most of this logic was contributed by Marc Zyngier ]
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_host.h |   8 +-
>   arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 166 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 91272c35cc36..4b5ef82f6bdb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>   
>   	/* Detect first run of a vcpu */
>   	bool has_run_once;
> +
> +	/* True when deferrable sysregs are loaded on the physical CPU,
> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> +	bool sysregs_loaded_on_cpu;
>   };
>   
>   #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>    */
>   #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>   
> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>   
>   /*
>    * CP14 and CP15 live in the same array, as they are backed by the
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 96398d53b462..9d353a6a55c9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -35,6 +35,7 @@
>   #include <asm/kvm_coproc.h>
>   #include <asm/kvm_emulate.h>
>   #include <asm/kvm_host.h>
> +#include <asm/kvm_hyp.h>
>   #include <asm/kvm_mmu.h>
>   #include <asm/perf_event.h>
>   #include <asm/sysreg.h>
> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>   	return false;
>   }
>   
> +struct sys_reg_accessor {
> +	u64	(*rdsr)(struct kvm_vcpu *, int);
> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);

Nit:

Why use a signed integer for the register index argument?

> +};
> +
> +#define DECLARE_IMMEDIATE_SR(i)						\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		__vcpu_sys_reg(vcpu, r) = v;				\
> +	}								\
> +
> +#define DECLARE_DEFERRABLE_SR(i, s)					\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			return read_sysreg_s((s));			\
> +		}							\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			write_sysreg_s(v, (s));				\
> +		} else {						\
> +			__vcpu_sys_reg(vcpu, r) = v;			\
> +		}							\
> +	}								\
> +
> +
> +#define SR_HANDLER_RANGE(i,e)						\
> +	[i ... e] =  (struct sys_reg_accessor) {			\
> +		.rdsr = __##i##_read,					\
> +		.wrsr = __##i##_write,					\

Nit:
Could we have __vcpu_##i##_read and __vcpu_##i##_write?

> +	}
> +
> +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> +
> +static void bad_sys_reg(int reg)
> +{
> +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> +}
> +
> +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	bad_sys_reg(reg);
> +	return 0;
> +}
> +
> +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	bad_sys_reg(reg);
> +}
> +
> +/* Ordered as in enum vcpu_sysreg */
> +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> +DECLARE_IMMEDIATE_SR(TCR_EL1);
> +DECLARE_IMMEDIATE_SR(ESR_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> +DECLARE_IMMEDIATE_SR(FAR_EL1);
> +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> +DECLARE_IMMEDIATE_SR(PAR_EL1);
> +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> +/* PMEVCNTR30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> +/* PMEVTYPER30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> +
> +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> +	[0 ... NR_SYS_REGS - 1] = {
> +		.rdsr = __default_read_sys_reg,
> +		.wrsr = __default_write_sys_reg,
> +	},
> +
> +	SR_HANDLER(MPIDR_EL1),
> +	SR_HANDLER(CSSELR_EL1),
> +	SR_HANDLER(SCTLR_EL1),
> +	SR_HANDLER(ACTLR_EL1),
> +	SR_HANDLER(CPACR_EL1),
> +	SR_HANDLER(TTBR0_EL1),
> +	SR_HANDLER(TTBR1_EL1),
> +	SR_HANDLER(TCR_EL1),
> +	SR_HANDLER(ESR_EL1),
> +	SR_HANDLER(AFSR0_EL1),
> +	SR_HANDLER(AFSR1_EL1),
> +	SR_HANDLER(FAR_EL1),
> +	SR_HANDLER(MAIR_EL1),
> +	SR_HANDLER(VBAR_EL1),
> +	SR_HANDLER(CONTEXTIDR_EL1),
> +	SR_HANDLER(TPIDR_EL0),
> +	SR_HANDLER(TPIDRRO_EL0),
> +	SR_HANDLER(TPIDR_EL1),
> +	SR_HANDLER(AMAIR_EL1),
> +	SR_HANDLER(CNTKCTL_EL1),
> +	SR_HANDLER(PAR_EL1),
> +	SR_HANDLER(MDSCR_EL1),
> +	SR_HANDLER(MDCCINT_EL1),
> +	SR_HANDLER(PMCR_EL0),
> +	SR_HANDLER(PMSELR_EL0),
> +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> +	SR_HANDLER(PMCCNTR_EL0),
> +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> +	SR_HANDLER(PMCCFILTR_EL0),
> +	SR_HANDLER(PMCNTENSET_EL0),
> +	SR_HANDLER(PMINTENSET_EL1),
> +	SR_HANDLER(PMOVSSET_EL0),
> +	SR_HANDLER(PMSWINC_EL0),
> +	SR_HANDLER(PMUSERENR_EL0),
> +	SR_HANDLER(DACR32_EL2),
> +	SR_HANDLER(IFSR32_EL2),
> +	SR_HANDLER(FPEXC32_EL2),
> +	SR_HANDLER(DBGVCR32_EL2),
> +};
> +
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> +}
> +
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> +}
> +
>   /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
>   static u32 cache_levels;
>   
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-17 18:22     ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 18:22 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm, Shih-Wei Li

Hi,

On 12/01/18 12:07, Christoffer Dall wrote:
> 32-bit registers are not used by a 64-bit host kernel and can be
> deferred, but we need to rework the accesses to this register to access
> the latest value depending on whether or not guest system registers are
> loaded on the CPU or only reside in memory.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
>   arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
>   arch/arm64/kvm/sys_regs.c            |  6 ++--
>   3 files changed, 60 insertions(+), 43 deletions(-)
> 

[...]

> diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> index bbc6ae32e4af..3f65098aff8d 100644
> --- a/arch/arm64/kvm/regmap.c
> +++ b/arch/arm64/kvm/regmap.c
> @@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
>   /*
>    * Return the SPSR for the current mode of the virtual CPU.
>    */
> -unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
> +static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
>   {
>   	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
>   	switch (mode) {
> -	case COMPAT_PSR_MODE_SVC:
> -		mode = KVM_SPSR_SVC;
> -		break;
> -	case COMPAT_PSR_MODE_ABT:
> -		mode = KVM_SPSR_ABT;
> -		break;
> -	case COMPAT_PSR_MODE_UND:
> -		mode = KVM_SPSR_UND;
> -		break;
> -	case COMPAT_PSR_MODE_IRQ:
> -		mode = KVM_SPSR_IRQ;
> -		break;
> -	case COMPAT_PSR_MODE_FIQ:
> -		mode = KVM_SPSR_FIQ;
> -		break;
> +	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
> +	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
> +	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
> +	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
> +	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
> +	default: BUG();
> +	}
> +}
> +
> +unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
> +{
> +	int spsr_idx = vcpu_spsr32_mode(vcpu);
> +
> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
> +		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
> +
> +	switch (spsr_idx) {
> +	case KVM_SPSR_SVC:
> +		return read_sysreg_el1(spsr);
> +	case KVM_SPSR_ABT:
> +		return read_sysreg(spsr_abt);
> +	case KVM_SPSR_UND:
> +		return read_sysreg(spsr_und);
> +	case KVM_SPSR_IRQ:
> +		return read_sysreg(spsr_irq);
> +	case KVM_SPSR_FIQ:
> +		return read_sysreg(spsr_fiq);
>   	default:
>   		BUG();

Nit:

Since the BUG() is in vcpu_spsr32_mode now, you can probably remove it 
here (or add it to vcpu_write_sprsr32 for consistency).

>   	}
> +}
>   
> -	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
> +void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
> +{
> +	int spsr_idx = vcpu_spsr32_mode(vcpu);
> +
> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
> +		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
> +
> +	switch (spsr_idx) {
> +	case KVM_SPSR_SVC:
> +		write_sysreg_el1(v, spsr);
> +	case KVM_SPSR_ABT:
> +		write_sysreg(v, spsr_abt);
> +	case KVM_SPSR_UND:
> +		write_sysreg(v, spsr_und);
> +	case KVM_SPSR_IRQ:
> +		write_sysreg(v, spsr_irq);
> +	case KVM_SPSR_FIQ:
> +		write_sysreg(v, spsr_fiq);
> +	}
>   }

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
@ 2018-01-17 18:22     ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-17 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 12/01/18 12:07, Christoffer Dall wrote:
> 32-bit registers are not used by a 64-bit host kernel and can be
> deferred, but we need to rework the accesses to this register to access
> the latest value depending on whether or not guest system registers are
> loaded on the CPU or only reside in memory.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
>   arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
>   arch/arm64/kvm/sys_regs.c            |  6 ++--
>   3 files changed, 60 insertions(+), 43 deletions(-)
> 

[...]

> diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> index bbc6ae32e4af..3f65098aff8d 100644
> --- a/arch/arm64/kvm/regmap.c
> +++ b/arch/arm64/kvm/regmap.c
> @@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
>   /*
>    * Return the SPSR for the current mode of the virtual CPU.
>    */
> -unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
> +static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
>   {
>   	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
>   	switch (mode) {
> -	case COMPAT_PSR_MODE_SVC:
> -		mode = KVM_SPSR_SVC;
> -		break;
> -	case COMPAT_PSR_MODE_ABT:
> -		mode = KVM_SPSR_ABT;
> -		break;
> -	case COMPAT_PSR_MODE_UND:
> -		mode = KVM_SPSR_UND;
> -		break;
> -	case COMPAT_PSR_MODE_IRQ:
> -		mode = KVM_SPSR_IRQ;
> -		break;
> -	case COMPAT_PSR_MODE_FIQ:
> -		mode = KVM_SPSR_FIQ;
> -		break;
> +	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
> +	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
> +	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
> +	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
> +	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
> +	default: BUG();
> +	}
> +}
> +
> +unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
> +{
> +	int spsr_idx = vcpu_spsr32_mode(vcpu);
> +
> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
> +		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
> +
> +	switch (spsr_idx) {
> +	case KVM_SPSR_SVC:
> +		return read_sysreg_el1(spsr);
> +	case KVM_SPSR_ABT:
> +		return read_sysreg(spsr_abt);
> +	case KVM_SPSR_UND:
> +		return read_sysreg(spsr_und);
> +	case KVM_SPSR_IRQ:
> +		return read_sysreg(spsr_irq);
> +	case KVM_SPSR_FIQ:
> +		return read_sysreg(spsr_fiq);
>   	default:
>   		BUG();

Nit:

Since the BUG() is in vcpu_spsr32_mode now, you can probably remove it 
here (or add it to vcpu_write_sprsr32 for consistency).

>   	}
> +}
>   
> -	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
> +void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
> +{
> +	int spsr_idx = vcpu_spsr32_mode(vcpu);
> +
> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
> +		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
> +
> +	switch (spsr_idx) {
> +	case KVM_SPSR_SVC:
> +		write_sysreg_el1(v, spsr);
> +	case KVM_SPSR_ABT:
> +		write_sysreg(v, spsr_abt);
> +	case KVM_SPSR_UND:
> +		write_sysreg(v, spsr_und);
> +	case KVM_SPSR_IRQ:
> +		write_sysreg(v, spsr_irq);
> +	case KVM_SPSR_FIQ:
> +		write_sysreg(v, spsr_fiq);
> +	}
>   }

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-18  9:31     ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-18  9:31 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm, Shih-Wei Li



On 12/01/18 12:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 3f54c55f77a1..ffd62e31f134 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>   void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>   bool __fpsimd_enabled(void);
>   
> +void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> +void __deactivate_traps_nvhe_put(void);
> +
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> +void deactivate_traps_vhe_put(void);
> +
>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>   void __noreturn __hyp_do_panic(unsigned long, ...);
>   
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c01bcfc3fb52..d14ab9650f81 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,22 +24,25 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
>   {
>   	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> -	 * it will cause an exception.
> +	 * We are about to trap all floating point register accesses to EL2,
> +	 * however, traps are only taken to EL2 if the operation would not
> +	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> +	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> +	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> +	 * any access to it will cause an exception.
>   	 */
>   	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
>   	    !vcpu->arch.guest_vfp_loaded) {
>   		write_sysreg(1 << 30, fpexc32_el2);
>   		isb();
>   	}
> +}
> +
> +static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
>   
>   	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> @@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
>   	write_sysreg(0, pmuserenr_el0);
>   }
>   
> -static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> @@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	else
>   		val &= ~CPACR_EL1_FPEN;
>   	write_sysreg(val, cpacr_el1);
> +}
>   
> +void deactivate_traps_vhe_put(void)
> +{
> +	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(__kvm_hyp_vector, vbar_el1);
>   }
>   
> -static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
>   	if (vcpu->arch.guest_vfp_loaded)
> @@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   	write_sysreg(val, cptr_el2);
>   }
>   
> +void __hyp_text __deactivate_traps_nvhe_put(void)
> +{
> +	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
> +}
> +
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +{
> +}
> +
>   static hyp_alternate_select(__activate_traps_arch,
>   			    __activate_traps_nvhe, __activate_traps_vhe,
>   			    ARM64_HAS_VIRT_HOST_EXTN);
> @@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> -	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   	write_sysreg(vectors, vbar_el1);
>   }
>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {
>   	u64 mdcr_el2 = read_sysreg(mdcr_el2);
>   
>   	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
> @@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_RW, hcr_el2);
> -	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>   }
>   
>   static hyp_alternate_select(__deactivate_traps_arch,
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index d225f5797651..7943d5b4dbcb 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
>   		return;
> +	}
>   
>   	__sysreg_save_user_state(host_ctxt);
>   
> @@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	__sysreg_restore_el1_state(guest_ctxt);
>   
>   	vcpu->arch.sysregs_loaded_on_cpu = true;
> +
> +	activate_traps_vhe_load(vcpu);
>   }
>   
>   /**
> @@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>   		vcpu->arch.guest_vfp_loaded = 0;
>   	}
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__deactivate_traps_nvhe_put);
>   		return;
> +	}
> +
> +	deactivate_traps_vhe_put();
>   
>   	__sysreg_save_el1_state(guest_ctxt);
>   	__sysreg_save_user_state(guest_ctxt);
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
@ 2018-01-18  9:31     ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-18  9:31 UTC (permalink / raw)
  To: linux-arm-kernel



On 12/01/18 12:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 3f54c55f77a1..ffd62e31f134 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>   void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>   bool __fpsimd_enabled(void);
>   
> +void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> +void __deactivate_traps_nvhe_put(void);
> +
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> +void deactivate_traps_vhe_put(void);
> +
>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>   void __noreturn __hyp_do_panic(unsigned long, ...);
>   
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c01bcfc3fb52..d14ab9650f81 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,22 +24,25 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
>   {
>   	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> -	 * it will cause an exception.
> +	 * We are about to trap all floating point register accesses to EL2,
> +	 * however, traps are only taken to EL2 if the operation would not
> +	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> +	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> +	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> +	 * any access to it will cause an exception.
>   	 */
>   	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
>   	    !vcpu->arch.guest_vfp_loaded) {
>   		write_sysreg(1 << 30, fpexc32_el2);
>   		isb();
>   	}
> +}
> +
> +static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
>   
>   	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> @@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
>   	write_sysreg(0, pmuserenr_el0);
>   }
>   
> -static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> @@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	else
>   		val &= ~CPACR_EL1_FPEN;
>   	write_sysreg(val, cpacr_el1);
> +}
>   
> +void deactivate_traps_vhe_put(void)
> +{
> +	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(__kvm_hyp_vector, vbar_el1);
>   }
>   
> -static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
>   	if (vcpu->arch.guest_vfp_loaded)
> @@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   	write_sysreg(val, cptr_el2);
>   }
>   
> +void __hyp_text __deactivate_traps_nvhe_put(void)
> +{
> +	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
> +}
> +
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +{
> +}
> +
>   static hyp_alternate_select(__activate_traps_arch,
>   			    __activate_traps_nvhe, __activate_traps_vhe,
>   			    ARM64_HAS_VIRT_HOST_EXTN);
> @@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> -	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   	write_sysreg(vectors, vbar_el1);
>   }
>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {
>   	u64 mdcr_el2 = read_sysreg(mdcr_el2);
>   
>   	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
> @@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_RW, hcr_el2);
> -	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>   }
>   
>   static hyp_alternate_select(__deactivate_traps_arch,
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index d225f5797651..7943d5b4dbcb 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
>   		return;
> +	}
>   
>   	__sysreg_save_user_state(host_ctxt);
>   
> @@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	__sysreg_restore_el1_state(guest_ctxt);
>   
>   	vcpu->arch.sysregs_loaded_on_cpu = true;
> +
> +	activate_traps_vhe_load(vcpu);
>   }
>   
>   /**
> @@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>   		vcpu->arch.guest_vfp_loaded = 0;
>   	}
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__deactivate_traps_nvhe_put);
>   		return;
> +	}
> +
> +	deactivate_traps_vhe_put();
>   
>   	__sysreg_save_el1_state(guest_ctxt);
>   	__sysreg_save_user_state(guest_ctxt);
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-15 14:14   ` Yury Norov
@ 2018-01-18 11:16     ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 11:16 UTC (permalink / raw)
  To: Yury Norov
  Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier, Shih-Wei Li,
	Andrew Jones, Sunil Goutham, Alex Bennee

Hi Yury,

[cc'ing Alex Bennee who had some thoughts on this]

On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.
> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.
> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:                   
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                         
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

So, I had a look at your measurement code, and just want to make a
sanity check that I understand the measurements correctly.

Firstly, if we execute something 100,000 times and summarize the result
for each run, and get anything less than 100,000 (in this case ~300),
without scaling the value, doesn't that mean that in the vast majority
of cases, you are getting 0 as your measurement?

Secondly, are we sure all the required memory barriers are in place?
I know that the IPI send contains an smp_wmb(), but when you read back
the value in the caller, do you have the necessary smp_wmb() on the
handler side and a corresponding smp_rmb() on the sending side?  I'm not
sure what kind of effect missing barriers for a measurement framework
like this would have, but it's worth making sure we're not chasing red
herrings here.

That obviously doesn't change that the overall turnaround time is
improved more in the v1 case than in the v3 case, which I'd like to
explore/bisect in any case.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-18 11:16     ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Yury,

[cc'ing Alex Bennee who had some thoughts on this]

On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.
> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.
> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:                   
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                         
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

So, I had a look at your measurement code, and just want to make a
sanity check that I understand the measurements correctly.

Firstly, if we execute something 100,000 times and summarize the result
for each run, and get anything less than 100,000 (in this case ~300),
without scaling the value, doesn't that mean that in the vast majority
of cases, you are getting 0 as your measurement?

Secondly, are we sure all the required memory barriers are in place?
I know that the IPI send contains an smp_wmb(), but when you read back
the value in the caller, do you have the necessary smp_wmb() on the
handler side and a corresponding smp_rmb() on the sending side?  I'm not
sure what kind of effect missing barriers for a measurement framework
like this would have, but it's worth making sure we're not chasing red
herrings here.

That obviously doesn't change that the overall turnaround time is
improved more in the v1 case than in the v3 case, which I'd like to
explore/bisect in any case.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-18 11:16     ` Christoffer Dall
@ 2018-01-18 12:18       ` Yury Norov
  -1 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-18 12:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Sunil Goutham, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Thu, Jan 18, 2018 at 12:16:32PM +0100, Christoffer Dall wrote:
> Hi Yury,
> 
> [cc'ing Alex Bennee who had some thoughts on this]
> 
> On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > VHE systems.  The general approach is to try to do as little work as
> > > possible when transitioning between the VM and the hypervisor.  This has
> > > the benefit of lower latency when waiting for interrupts and delivering
> > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > I/O in the host kernel.
> > > 
> > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > that can be generally improved.  We then add infrastructure to move more
> > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > registers.
> > > 
> > > We then introduce a new world-switch function for VHE systems, which we
> > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > the system register save/restore handling and emulation code that may
> > > need access to system registers, so that we can defer as many system
> > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > this logic out of the VHE world switch function.
> > > 
> > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > the host and VM kernels run in EL1, but because the host kernel should
> > > have full access to the underlying hardware, but the VM kernel should
> > > not, we essentially make the host kernel more privileged than the VM
> > > kernel despite them both running at the same privilege level by enabling
> > > VE traps when entering the VM and disabling those traps when exiting the
> > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > the hardware (as much as allowed by secure side software), and is
> > > unaffected by the trap configuration.  That means we can configure the
> > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > off for every entry/exit to/from the VM.
> > > 
> > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > out of the VHE world-switch, and we make it possible to truly only
> > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > the case, and only do the minimal amount of work required in the course
> > > of the VGIC processing when we have virtual interrupts in flight.
> > > 
> > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > interrupts support series [1], and the first five patches of James' SDEI
> > > series [2].
> > > 
> > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > functionality on the Foundation model, running both 64-bit VMs and
> > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > GICv2-on-GICv3.
> > > 
> > > The patches are also available in the vhe-optimize-v3 branch on my
> > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > prerequisites of this series.
> > > 
> > > Changes since v2:
> > >  - Rebased on v4.15-rc3.
> > >  - Includes two additional patches that only does vcpu_load after
> > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > >  - Addressed review comments from v2 (detailed changelogs are in the
> > >    individual patches).
> > > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > I tested this v3 series on ThunderX2 with IPI benchmark:
> > https://lkml.org/lkml/2017/12/11/364
> > 
> > I tried to address your comments in discussion to v2, like pinning
> > the module to specific CPU (with taskset), increasing the number of
> > iterations, tuning governor to max performance. Results didn't change
> > much, and are pretty stable.
> > 
> > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > the acknowledge time is much faster for v3, so overall time to
> > deliver and acknowledge IPI (2nd column) is less than vanilla
> > 4.15-rc3 kernel.
> > 
> > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > guest is running under qemu-kvm, emulating gic version 3.
> > 
> > Below is test results for v1-3 normalized to host vanilla kernel
> > dry-run time.
> > 
> > Yury
> > 
> > Host, v4.14:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      81       110
> > Broadcast IPI:    0      2106
> > 
> > Guest, v4.14:
> > Dry-run:          0         1
> > Self-IPI:        10        18
> > Normal IPI:     305       525
> > Broadcast IPI:    0      9729
> > 
> > Guest, v4.14 + VHE:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     176       343
> > Broadcast IPI:    0      9885
> > 
> > And for v2.
> > 
> > Host, v4.15:                   
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      79       108
> > Broadcast IPI:    0      2102
> >                         
> > Guest, v4.15-rc:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     291       526
> > Broadcast IPI:    0     10439
> > 
> > Guest, v4.15-rc + VHE:
> > Dry-run:          0         2
> > Self-IPI:        14        28
> > Normal IPI:     370       569
> > Broadcast IPI:    0     11688
> > 
> > And for v3.
> > 
> > Host 4.15-rc3					
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	 80	  110
> > Broadcast IPI:	  0	 2088
> > 		
> > Guest, 4.15-rc3	
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	289	  497
> > Broadcast IPI:	  0	 9999
> > 		
> > Guest, 4.15-rc3	+ VHE
> > Dry-run:	  0	    2
> > Self-IPI:	 12	   24
> > Normal IPI:	347	  490
> > Broadcast IPI:	  0	11906
> 
> So, I had a look at your measurement code, and just want to make a
> sanity check that I understand the measurements correctly.
> 
> Firstly, if we execute something 100,000 times and summarize the result
> for each run, and get anything less than 100,000 (in this case ~300),
> without scaling the value, doesn't that mean that in the vast majority
> of cases, you are getting 0 as your measurement?

I cannot report absolute numbers so I posted normalized values to dry-run
case. 300 for IPI delivery means that it 300 times slower than no-op
(dry-run case). Absolute numbers looks quite reasonable, few useconds
for normal IPI.

Let me know if you need absolute numbers.
https://lkml.org/lkml/2017/12/13/301
 
> Secondly, are we sure all the required memory barriers are in place?
> I know that the IPI send contains an smp_wmb(), but when you read back
> the value in the caller, do you have the necessary smp_wmb() on the
> handler side and a corresponding smp_rmb() on the sending side?  I'm not
> sure what kind of effect missing barriers for a measurement framework
> like this would have, but it's worth making sure we're not chasing red
> herrings here.

I don't share memory between PMUs. Instead I completely rely on
smp_call_function_single() which takes *info parameter to share
data.

Looking at generic_exec_single() code that makes work, for self-ipi
things are trivial; and for normal ipi, there's detailed comment
on cache visibility of *info. So I hope everything is right there.

/*  
 * The list addition should be visible before sending the IPI
 * handler locks the list to pull the entry off it because of
 * normal cache coherency rules implied by spinlocks.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
 * to arch code to make it appear to obey cache coherency WRT
 * locking and barrier primitives.  Generic code isn't really
 * equipped to do the right thing...
 */
 
> That obviously doesn't change that the overall turnaround time is
> improved more in the v1 case than in the v3 case, which I'd like to
> explore/bisect in any case.

So me. For any idea, let me know, I'll check it.

Yury

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-18 12:18       ` Yury Norov
  0 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-01-18 12:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 18, 2018 at 12:16:32PM +0100, Christoffer Dall wrote:
> Hi Yury,
> 
> [cc'ing Alex Bennee who had some thoughts on this]
> 
> On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > VHE systems.  The general approach is to try to do as little work as
> > > possible when transitioning between the VM and the hypervisor.  This has
> > > the benefit of lower latency when waiting for interrupts and delivering
> > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > I/O in the host kernel.
> > > 
> > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > that can be generally improved.  We then add infrastructure to move more
> > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > registers.
> > > 
> > > We then introduce a new world-switch function for VHE systems, which we
> > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > the system register save/restore handling and emulation code that may
> > > need access to system registers, so that we can defer as many system
> > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > this logic out of the VHE world switch function.
> > > 
> > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > the host and VM kernels run in EL1, but because the host kernel should
> > > have full access to the underlying hardware, but the VM kernel should
> > > not, we essentially make the host kernel more privileged than the VM
> > > kernel despite them both running at the same privilege level by enabling
> > > VE traps when entering the VM and disabling those traps when exiting the
> > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > the hardware (as much as allowed by secure side software), and is
> > > unaffected by the trap configuration.  That means we can configure the
> > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > off for every entry/exit to/from the VM.
> > > 
> > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > out of the VHE world-switch, and we make it possible to truly only
> > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > the case, and only do the minimal amount of work required in the course
> > > of the VGIC processing when we have virtual interrupts in flight.
> > > 
> > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > interrupts support series [1], and the first five patches of James' SDEI
> > > series [2].
> > > 
> > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > functionality on the Foundation model, running both 64-bit VMs and
> > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > GICv2-on-GICv3.
> > > 
> > > The patches are also available in the vhe-optimize-v3 branch on my
> > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > prerequisites of this series.
> > > 
> > > Changes since v2:
> > >  - Rebased on v4.15-rc3.
> > >  - Includes two additional patches that only does vcpu_load after
> > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > >  - Addressed review comments from v2 (detailed changelogs are in the
> > >    individual patches).
> > > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > I tested this v3 series on ThunderX2 with IPI benchmark:
> > https://lkml.org/lkml/2017/12/11/364
> > 
> > I tried to address your comments in discussion to v2, like pinning
> > the module to specific CPU (with taskset), increasing the number of
> > iterations, tuning governor to max performance. Results didn't change
> > much, and are pretty stable.
> > 
> > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > the acknowledge time is much faster for v3, so overall time to
> > deliver and acknowledge IPI (2nd column) is less than vanilla
> > 4.15-rc3 kernel.
> > 
> > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > guest is running under qemu-kvm, emulating gic version 3.
> > 
> > Below is test results for v1-3 normalized to host vanilla kernel
> > dry-run time.
> > 
> > Yury
> > 
> > Host, v4.14:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      81       110
> > Broadcast IPI:    0      2106
> > 
> > Guest, v4.14:
> > Dry-run:          0         1
> > Self-IPI:        10        18
> > Normal IPI:     305       525
> > Broadcast IPI:    0      9729
> > 
> > Guest, v4.14 + VHE:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     176       343
> > Broadcast IPI:    0      9885
> > 
> > And for v2.
> > 
> > Host, v4.15:                   
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:      79       108
> > Broadcast IPI:    0      2102
> >                         
> > Guest, v4.15-rc:
> > Dry-run:          0         1
> > Self-IPI:         9        18
> > Normal IPI:     291       526
> > Broadcast IPI:    0     10439
> > 
> > Guest, v4.15-rc + VHE:
> > Dry-run:          0         2
> > Self-IPI:        14        28
> > Normal IPI:     370       569
> > Broadcast IPI:    0     11688
> > 
> > And for v3.
> > 
> > Host 4.15-rc3					
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	 80	  110
> > Broadcast IPI:	  0	 2088
> > 		
> > Guest, 4.15-rc3	
> > Dry-run:	  0	    1
> > Self-IPI:	  9	   18
> > Normal IPI:	289	  497
> > Broadcast IPI:	  0	 9999
> > 		
> > Guest, 4.15-rc3	+ VHE
> > Dry-run:	  0	    2
> > Self-IPI:	 12	   24
> > Normal IPI:	347	  490
> > Broadcast IPI:	  0	11906
> 
> So, I had a look at your measurement code, and just want to make a
> sanity check that I understand the measurements correctly.
> 
> Firstly, if we execute something 100,000 times and summarize the result
> for each run, and get anything less than 100,000 (in this case ~300),
> without scaling the value, doesn't that mean that in the vast majority
> of cases, you are getting 0 as your measurement?

I cannot report absolute numbers so I posted normalized values to dry-run
case. 300 for IPI delivery means that it 300 times slower than no-op
(dry-run case). Absolute numbers looks quite reasonable, few useconds
for normal IPI.

Let me know if you need absolute numbers.
https://lkml.org/lkml/2017/12/13/301
 
> Secondly, are we sure all the required memory barriers are in place?
> I know that the IPI send contains an smp_wmb(), but when you read back
> the value in the caller, do you have the necessary smp_wmb() on the
> handler side and a corresponding smp_rmb() on the sending side?  I'm not
> sure what kind of effect missing barriers for a measurement framework
> like this would have, but it's worth making sure we're not chasing red
> herrings here.

I don't share memory between PMUs. Instead I completely rely on
smp_call_function_single() which takes *info parameter to share
data.

Looking at generic_exec_single() code that makes work, for self-ipi
things are trivial; and for normal ipi, there's detailed comment
on cache visibility of *info. So I hope everything is right there.

/*  
 * The list addition should be visible before sending the IPI
 * handler locks the list to pull the entry off it because of
 * normal cache coherency rules implied by spinlocks.
 *
 * If IPIs can go out of order to the cache coherency protocol
 * in an architecture, sufficient synchronisation should be added
 * to arch code to make it appear to obey cache coherency WRT
 * locking and barrier primitives.  Generic code isn't really
 * equipped to do the right thing...
 */
 
> That obviously doesn't change that the overall turnaround time is
> improved more in the v1 case than in the v3 case, which I'd like to
> explore/bisect in any case.

So me. For any idea, let me know, I'll check it.

Yury

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  2018-01-17 14:44     ` Julien Thierry
@ 2018-01-18 12:57       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 12:57 UTC (permalink / raw)
  To: Julien Thierry
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Andrew Jones, Shih-Wei Li, kvm

On Wed, Jan 17, 2018 at 02:44:32PM +0000, Julien Thierry wrote:
> Hi Christoffer,
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >We have numerous checks around that checks if the HCR_EL2 has the RW bit
> >set to figure out if we're running an AArch64 or AArch32 VM.  In some
> >cases, directly checking the RW bit (given its unintuitive name), is a
> >bit confusing, and that's not going to improve as we move logic around
> >for the following patches that optimize KVM on AArch64 hosts with VHE.
> >
> >Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> >direct checks of HCR_EL2.RW with the helper.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
> >  arch/arm64/kvm/hyp/switch.c          | 8 ++------
> >  arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
> >  arch/arm64/kvm/inject_fault.c        | 6 +++---
> >  4 files changed, 14 insertions(+), 12 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >index b36aaa1fe332..e07bf463ac58 100644
> >--- a/arch/arm64/include/asm/kvm_emulate.h
> >+++ b/arch/arm64/include/asm/kvm_emulate.h
> >@@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
> >  void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
> >  void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
> >+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> >+{
> >+	return !(vcpu->arch.hcr_el2 & HCR_RW);
> >+}
> >+
> 
> Just so I understand, the difference between this and vcpu_mode_is_32bit is
> that vcpu_mode_is_32bit might return true because an interrupt/exception
> occured while guest was executing 32bit EL0 but guest EL1 is still 64bits,
> is that correct?

Yes.

> 
> Also, it seems the process controlling KVM is supposed to provide the
> information of whether the vcpu runs a 32bit el1, would it be better to do:
> 
> 	return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
> 
> instead of looking at the hcr? Or is there a case where those might differ?

I think in the current implementation, both would work fine, and they
shouldn't differ.  I prefer checking the HCR, because we then know we'll
be consistent with what the hardware does, and the feature array is
mostly there to negotiate between userspace and the kernel.  Also, we
were already using the HCR.

If there's an argument for checking the feature bits instead, I'm open
to that idea, potentially as a separate patch explaining the rationale.

> 
> Otherwise:
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
Thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
@ 2018-01-18 12:57       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 12:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 02:44:32PM +0000, Julien Thierry wrote:
> Hi Christoffer,
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >We have numerous checks around that checks if the HCR_EL2 has the RW bit
> >set to figure out if we're running an AArch64 or AArch32 VM.  In some
> >cases, directly checking the RW bit (given its unintuitive name), is a
> >bit confusing, and that's not going to improve as we move logic around
> >for the following patches that optimize KVM on AArch64 hosts with VHE.
> >
> >Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> >direct checks of HCR_EL2.RW with the helper.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_emulate.h | 7 ++++++-
> >  arch/arm64/kvm/hyp/switch.c          | 8 ++------
> >  arch/arm64/kvm/hyp/sysreg-sr.c       | 5 +++--
> >  arch/arm64/kvm/inject_fault.c        | 6 +++---
> >  4 files changed, 14 insertions(+), 12 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >index b36aaa1fe332..e07bf463ac58 100644
> >--- a/arch/arm64/include/asm/kvm_emulate.h
> >+++ b/arch/arm64/include/asm/kvm_emulate.h
> >@@ -45,6 +45,11 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
> >  void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
> >  void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
> >+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
> >+{
> >+	return !(vcpu->arch.hcr_el2 & HCR_RW);
> >+}
> >+
> 
> Just so I understand, the difference between this and vcpu_mode_is_32bit is
> that vcpu_mode_is_32bit might return true because an interrupt/exception
> occured while guest was executing 32bit EL0 but guest EL1 is still 64bits,
> is that correct?

Yes.

> 
> Also, it seems the process controlling KVM is supposed to provide the
> information of whether the vcpu runs a 32bit el1, would it be better to do:
> 
> 	return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
> 
> instead of looking at the hcr? Or is there a case where those might differ?

I think in the current implementation, both would work fine, and they
shouldn't differ.  I prefer checking the HCR, because we then know we'll
be consistent with what the hardware does, and the feature array is
mostly there to negotiate between userspace and the kernel.  Also, we
were already using the HCR.

If there's an argument for checking the feature bits instead, I'm open
to that idea, potentially as a separate patch explaining the rationale.

> 
> Otherwise:
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
Thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-17 17:52     ` Julien Thierry
@ 2018-01-18 13:08       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:08 UTC (permalink / raw)
  To: Julien Thierry; +Cc: kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Wed, Jan 17, 2018 at 05:52:21PM +0000, Julien Thierry wrote:
> 
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >We are about to defer saving and restoring some groups of system
> >registers to vcpu_put and vcpu_load on supported systems.  This means
> >that we need some infrastructure to access system registes which
> >supports either accessing the memory backing of the register or directly
> >accessing the system registers, depending on the state of the system
> >when we access the register.
> >
> >We do this by defining a set of read/write accessors for each system
> >register, and letting each system register be defined as "immediate" or
> >"deferrable".  Immediate registers are always saved/restored in the
> >world-switch path, but deferrable registers are only saved/restored in
> >vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> >in that case.
> >
> 
> The patch is fine, however I'd suggest adding a comment in the pointing out
> that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu struct.
> Instinctively I would expect the deferrable/immediate to apply to the actual
> hardware register access, so a comment would prevent people like me to get
> on the wrong track.
> 

I tried to explain that a bit in the first sentence of the commit
message, but I can try to make it more clear that we introduce
terminology.

> >Not that we don't use the deferred mechanism yet in this patch, but only
> >introduce infrastructure.  This is to improve convenience of review in
> >the subsequent patches where it is clear which registers become
> >deferred.
> >
> >  [ Most of this logic was contributed by Marc Zyngier ]
> >
> >Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
> >---
> >  arch/arm64/include/asm/kvm_host.h |   8 +-
> >  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 166 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >index 91272c35cc36..4b5ef82f6bdb 100644
> >--- a/arch/arm64/include/asm/kvm_host.h
> >+++ b/arch/arm64/include/asm/kvm_host.h
> >@@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
> >  	/* Detect first run of a vcpu */
> >  	bool has_run_once;
> >+
> >+	/* True when deferrable sysregs are loaded on the physical CPU,
> >+	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> >+	bool sysregs_loaded_on_cpu;
> >  };
> >  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> >@@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
> >   */
> >  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
> >-#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> >-#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> >+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> >+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
> >  /*
> >   * CP14 and CP15 live in the same array, as they are backed by the
> >diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >index 96398d53b462..9d353a6a55c9 100644
> >--- a/arch/arm64/kvm/sys_regs.c
> >+++ b/arch/arm64/kvm/sys_regs.c
> >@@ -35,6 +35,7 @@
> >  #include <asm/kvm_coproc.h>
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_host.h>
> >+#include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/perf_event.h>
> >  #include <asm/sysreg.h>
> >@@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
> >  	return false;
> >  }
> >+struct sys_reg_accessor {
> >+	u64	(*rdsr)(struct kvm_vcpu *, int);
> >+	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> 
> Nit:
> 
> Why use a signed integer for the register index argument?
> 

The type name is short? ;)  No particular reason, could be an unsigned
int, but I don't think it matters here does it?

> >+};
> >+
> >+#define DECLARE_IMMEDIATE_SR(i)						\
> >+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> >+	{								\
> >+		return __vcpu_sys_reg(vcpu, r);				\
> >+	}								\
> >+									\
> >+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> >+	{								\
> >+		__vcpu_sys_reg(vcpu, r) = v;				\
> >+	}								\
> >+
> >+#define DECLARE_DEFERRABLE_SR(i, s)					\
> >+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> >+	{								\
> >+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> >+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> >+			return read_sysreg_s((s));			\
> >+		}							\
> >+		return __vcpu_sys_reg(vcpu, r);				\
> >+	}								\
> >+									\
> >+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> >+	{								\
> >+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> >+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> >+			write_sysreg_s(v, (s));				\
> >+		} else {						\
> >+			__vcpu_sys_reg(vcpu, r) = v;			\
> >+		}							\
> >+	}								\
> >+
> >+
> >+#define SR_HANDLER_RANGE(i,e)						\
> >+	[i ... e] =  (struct sys_reg_accessor) {			\
> >+		.rdsr = __##i##_read,					\
> >+		.wrsr = __##i##_write,					\
> 
> Nit:
> Could we have __vcpu_##i##_read and __vcpu_##i##_write?
> 

They don't necessarily read from the vcpu do they?

Unrelated: I also thought about just having a single function a switch
statement instead, which may make it easier to follow the code as there
would be no macros generating functions, but it would be slightly less
declarative.

For example:

u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
{
	if (!vcpu->arch.sysregs_loaded_on_cpu)
		goto immediate_read;
	
	/*
	 * All system registers listed in the switch are deferred
	 * save/restored on VHE systems.
	 */
	switch (reg) {
	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1));
	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12));
	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1));
	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12));
	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12));
	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12));
	case TCR_EL1:		return read_sysreg_s(tcr_EL12));
	case ESR_EL1:		return read_sysreg_s(esr_EL12));
	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12));
	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12));
	case FAR_EL1:		return read_sysreg_s(far_EL12));
	case MAIR_EL1:		return read_sysreg_s(mair_EL12));
	case VBAR_EL1:		return read_sysreg_s(vbar_EL12));
	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12));
	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0));
	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0));
	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1));
	case AMAIR_EL1:		return read_sysreg_s(amair_EL12));
	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12));
	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1));
	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2));
	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2));
	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2));
	}

immediate_read:
	return __vcpu_sys_reg(vcpu, reg);
}

Since you're having a look at this, what are your thoughts?

Marc, what's your preference?

Thanks,
-Christoffer

> >+	}
> >+
> >+#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> >+
> >+static void bad_sys_reg(int reg)
> >+{
> >+	WARN_ONCE(1, "Bad system register access %d\n", reg);
> >+}
> >+
> >+static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> >+{
> >+	bad_sys_reg(reg);
> >+	return 0;
> >+}
> >+
> >+static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> >+{
> >+	bad_sys_reg(reg);
> >+}
> >+
> >+/* Ordered as in enum vcpu_sysreg */
> >+DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> >+DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> >+DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> >+DECLARE_IMMEDIATE_SR(CPACR_EL1);
> >+DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> >+DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> >+DECLARE_IMMEDIATE_SR(TCR_EL1);
> >+DECLARE_IMMEDIATE_SR(ESR_EL1);
> >+DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> >+DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> >+DECLARE_IMMEDIATE_SR(FAR_EL1);
> >+DECLARE_IMMEDIATE_SR(MAIR_EL1);
> >+DECLARE_IMMEDIATE_SR(VBAR_EL1);
> >+DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> >+DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> >+DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> >+DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> >+DECLARE_IMMEDIATE_SR(PAR_EL1);
> >+DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> >+DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> >+DECLARE_IMMEDIATE_SR(PMCR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> >+/* PMEVCNTR30_EL0 */
> >+DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> >+/* PMEVTYPER30_EL0 */
> >+DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> >+DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> >+DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> >+DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> >+DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> >+DECLARE_IMMEDIATE_SR(DACR32_EL2);
> >+DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> >+DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> >+DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> >+
> >+static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> >+	[0 ... NR_SYS_REGS - 1] = {
> >+		.rdsr = __default_read_sys_reg,
> >+		.wrsr = __default_write_sys_reg,
> >+	},
> >+
> >+	SR_HANDLER(MPIDR_EL1),
> >+	SR_HANDLER(CSSELR_EL1),
> >+	SR_HANDLER(SCTLR_EL1),
> >+	SR_HANDLER(ACTLR_EL1),
> >+	SR_HANDLER(CPACR_EL1),
> >+	SR_HANDLER(TTBR0_EL1),
> >+	SR_HANDLER(TTBR1_EL1),
> >+	SR_HANDLER(TCR_EL1),
> >+	SR_HANDLER(ESR_EL1),
> >+	SR_HANDLER(AFSR0_EL1),
> >+	SR_HANDLER(AFSR1_EL1),
> >+	SR_HANDLER(FAR_EL1),
> >+	SR_HANDLER(MAIR_EL1),
> >+	SR_HANDLER(VBAR_EL1),
> >+	SR_HANDLER(CONTEXTIDR_EL1),
> >+	SR_HANDLER(TPIDR_EL0),
> >+	SR_HANDLER(TPIDRRO_EL0),
> >+	SR_HANDLER(TPIDR_EL1),
> >+	SR_HANDLER(AMAIR_EL1),
> >+	SR_HANDLER(CNTKCTL_EL1),
> >+	SR_HANDLER(PAR_EL1),
> >+	SR_HANDLER(MDSCR_EL1),
> >+	SR_HANDLER(MDCCINT_EL1),
> >+	SR_HANDLER(PMCR_EL0),
> >+	SR_HANDLER(PMSELR_EL0),
> >+	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> >+	SR_HANDLER(PMCCNTR_EL0),
> >+	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> >+	SR_HANDLER(PMCCFILTR_EL0),
> >+	SR_HANDLER(PMCNTENSET_EL0),
> >+	SR_HANDLER(PMINTENSET_EL1),
> >+	SR_HANDLER(PMOVSSET_EL0),
> >+	SR_HANDLER(PMSWINC_EL0),
> >+	SR_HANDLER(PMUSERENR_EL0),
> >+	SR_HANDLER(DACR32_EL2),
> >+	SR_HANDLER(IFSR32_EL2),
> >+	SR_HANDLER(FPEXC32_EL2),
> >+	SR_HANDLER(DBGVCR32_EL2),
> >+};
> >+
> >+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> >+{
> >+	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> >+}
> >+
> >+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> >+{
> >+	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> >+}
> >+
> >  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
> >  static u32 cache_levels;
> >
> 
> -- 
> Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-18 13:08       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 05:52:21PM +0000, Julien Thierry wrote:
> 
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >We are about to defer saving and restoring some groups of system
> >registers to vcpu_put and vcpu_load on supported systems.  This means
> >that we need some infrastructure to access system registes which
> >supports either accessing the memory backing of the register or directly
> >accessing the system registers, depending on the state of the system
> >when we access the register.
> >
> >We do this by defining a set of read/write accessors for each system
> >register, and letting each system register be defined as "immediate" or
> >"deferrable".  Immediate registers are always saved/restored in the
> >world-switch path, but deferrable registers are only saved/restored in
> >vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> >in that case.
> >
> 
> The patch is fine, however I'd suggest adding a comment in the pointing out
> that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu struct.
> Instinctively I would expect the deferrable/immediate to apply to the actual
> hardware register access, so a comment would prevent people like me to get
> on the wrong track.
> 

I tried to explain that a bit in the first sentence of the commit
message, but I can try to make it more clear that we introduce
terminology.

> >Not that we don't use the deferred mechanism yet in this patch, but only
> >introduce infrastructure.  This is to improve convenience of review in
> >the subsequent patches where it is clear which registers become
> >deferred.
> >
> >  [ Most of this logic was contributed by Marc Zyngier ]
> >
> >Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
> >---
> >  arch/arm64/include/asm/kvm_host.h |   8 +-
> >  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 166 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >index 91272c35cc36..4b5ef82f6bdb 100644
> >--- a/arch/arm64/include/asm/kvm_host.h
> >+++ b/arch/arm64/include/asm/kvm_host.h
> >@@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
> >  	/* Detect first run of a vcpu */
> >  	bool has_run_once;
> >+
> >+	/* True when deferrable sysregs are loaded on the physical CPU,
> >+	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> >+	bool sysregs_loaded_on_cpu;
> >  };
> >  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> >@@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
> >   */
> >  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
> >-#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> >-#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> >+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> >+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
> >  /*
> >   * CP14 and CP15 live in the same array, as they are backed by the
> >diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >index 96398d53b462..9d353a6a55c9 100644
> >--- a/arch/arm64/kvm/sys_regs.c
> >+++ b/arch/arm64/kvm/sys_regs.c
> >@@ -35,6 +35,7 @@
> >  #include <asm/kvm_coproc.h>
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_host.h>
> >+#include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/perf_event.h>
> >  #include <asm/sysreg.h>
> >@@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
> >  	return false;
> >  }
> >+struct sys_reg_accessor {
> >+	u64	(*rdsr)(struct kvm_vcpu *, int);
> >+	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> 
> Nit:
> 
> Why use a signed integer for the register index argument?
> 

The type name is short? ;)  No particular reason, could be an unsigned
int, but I don't think it matters here does it?

> >+};
> >+
> >+#define DECLARE_IMMEDIATE_SR(i)						\
> >+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> >+	{								\
> >+		return __vcpu_sys_reg(vcpu, r);				\
> >+	}								\
> >+									\
> >+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> >+	{								\
> >+		__vcpu_sys_reg(vcpu, r) = v;				\
> >+	}								\
> >+
> >+#define DECLARE_DEFERRABLE_SR(i, s)					\
> >+	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> >+	{								\
> >+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> >+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> >+			return read_sysreg_s((s));			\
> >+		}							\
> >+		return __vcpu_sys_reg(vcpu, r);				\
> >+	}								\
> >+									\
> >+	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> >+	{								\
> >+		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> >+			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> >+			write_sysreg_s(v, (s));				\
> >+		} else {						\
> >+			__vcpu_sys_reg(vcpu, r) = v;			\
> >+		}							\
> >+	}								\
> >+
> >+
> >+#define SR_HANDLER_RANGE(i,e)						\
> >+	[i ... e] =  (struct sys_reg_accessor) {			\
> >+		.rdsr = __##i##_read,					\
> >+		.wrsr = __##i##_write,					\
> 
> Nit:
> Could we have __vcpu_##i##_read and __vcpu_##i##_write?
> 

They don't necessarily read from the vcpu do they?

Unrelated: I also thought about just having a single function a switch
statement instead, which may make it easier to follow the code as there
would be no macros generating functions, but it would be slightly less
declarative.

For example:

u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
{
	if (!vcpu->arch.sysregs_loaded_on_cpu)
		goto immediate_read;
	
	/*
	 * All system registers listed in the switch are deferred
	 * save/restored on VHE systems.
	 */
	switch (reg) {
	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1));
	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12));
	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1));
	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12));
	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12));
	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12));
	case TCR_EL1:		return read_sysreg_s(tcr_EL12));
	case ESR_EL1:		return read_sysreg_s(esr_EL12));
	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12));
	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12));
	case FAR_EL1:		return read_sysreg_s(far_EL12));
	case MAIR_EL1:		return read_sysreg_s(mair_EL12));
	case VBAR_EL1:		return read_sysreg_s(vbar_EL12));
	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12));
	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0));
	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0));
	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1));
	case AMAIR_EL1:		return read_sysreg_s(amair_EL12));
	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12));
	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1));
	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2));
	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2));
	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2));
	}

immediate_read:
	return __vcpu_sys_reg(vcpu, reg);
}

Since you're having a look at this, what are your thoughts?

Marc, what's your preference?

Thanks,
-Christoffer

> >+	}
> >+
> >+#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> >+
> >+static void bad_sys_reg(int reg)
> >+{
> >+	WARN_ONCE(1, "Bad system register access %d\n", reg);
> >+}
> >+
> >+static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> >+{
> >+	bad_sys_reg(reg);
> >+	return 0;
> >+}
> >+
> >+static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> >+{
> >+	bad_sys_reg(reg);
> >+}
> >+
> >+/* Ordered as in enum vcpu_sysreg */
> >+DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> >+DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> >+DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> >+DECLARE_IMMEDIATE_SR(CPACR_EL1);
> >+DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> >+DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> >+DECLARE_IMMEDIATE_SR(TCR_EL1);
> >+DECLARE_IMMEDIATE_SR(ESR_EL1);
> >+DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> >+DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> >+DECLARE_IMMEDIATE_SR(FAR_EL1);
> >+DECLARE_IMMEDIATE_SR(MAIR_EL1);
> >+DECLARE_IMMEDIATE_SR(VBAR_EL1);
> >+DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> >+DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> >+DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> >+DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> >+DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> >+DECLARE_IMMEDIATE_SR(PAR_EL1);
> >+DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> >+DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> >+DECLARE_IMMEDIATE_SR(PMCR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> >+/* PMEVCNTR30_EL0 */
> >+DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> >+/* PMEVTYPER30_EL0 */
> >+DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> >+DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> >+DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> >+DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> >+DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> >+DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> >+DECLARE_IMMEDIATE_SR(DACR32_EL2);
> >+DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> >+DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> >+DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> >+
> >+static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> >+	[0 ... NR_SYS_REGS - 1] = {
> >+		.rdsr = __default_read_sys_reg,
> >+		.wrsr = __default_write_sys_reg,
> >+	},
> >+
> >+	SR_HANDLER(MPIDR_EL1),
> >+	SR_HANDLER(CSSELR_EL1),
> >+	SR_HANDLER(SCTLR_EL1),
> >+	SR_HANDLER(ACTLR_EL1),
> >+	SR_HANDLER(CPACR_EL1),
> >+	SR_HANDLER(TTBR0_EL1),
> >+	SR_HANDLER(TTBR1_EL1),
> >+	SR_HANDLER(TCR_EL1),
> >+	SR_HANDLER(ESR_EL1),
> >+	SR_HANDLER(AFSR0_EL1),
> >+	SR_HANDLER(AFSR1_EL1),
> >+	SR_HANDLER(FAR_EL1),
> >+	SR_HANDLER(MAIR_EL1),
> >+	SR_HANDLER(VBAR_EL1),
> >+	SR_HANDLER(CONTEXTIDR_EL1),
> >+	SR_HANDLER(TPIDR_EL0),
> >+	SR_HANDLER(TPIDRRO_EL0),
> >+	SR_HANDLER(TPIDR_EL1),
> >+	SR_HANDLER(AMAIR_EL1),
> >+	SR_HANDLER(CNTKCTL_EL1),
> >+	SR_HANDLER(PAR_EL1),
> >+	SR_HANDLER(MDSCR_EL1),
> >+	SR_HANDLER(MDCCINT_EL1),
> >+	SR_HANDLER(PMCR_EL0),
> >+	SR_HANDLER(PMSELR_EL0),
> >+	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> >+	SR_HANDLER(PMCCNTR_EL0),
> >+	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> >+	SR_HANDLER(PMCCFILTR_EL0),
> >+	SR_HANDLER(PMCNTENSET_EL0),
> >+	SR_HANDLER(PMINTENSET_EL1),
> >+	SR_HANDLER(PMOVSSET_EL0),
> >+	SR_HANDLER(PMSWINC_EL0),
> >+	SR_HANDLER(PMUSERENR_EL0),
> >+	SR_HANDLER(DACR32_EL2),
> >+	SR_HANDLER(IFSR32_EL2),
> >+	SR_HANDLER(FPEXC32_EL2),
> >+	SR_HANDLER(DBGVCR32_EL2),
> >+};
> >+
> >+u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> >+{
> >+	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> >+}
> >+
> >+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> >+{
> >+	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> >+}
> >+
> >  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
> >  static u32 cache_levels;
> >
> 
> -- 
> Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
  2018-01-17 18:22     ` Julien Thierry
@ 2018-01-18 13:12       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:12 UTC (permalink / raw)
  To: Julien Thierry; +Cc: kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Wed, Jan 17, 2018 at 06:22:29PM +0000, Julien Thierry wrote:
> Hi,
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >32-bit registers are not used by a 64-bit host kernel and can be
> >deferred, but we need to rework the accesses to this register to access
> >the latest value depending on whether or not guest system registers are
> >loaded on the CPU or only reside in memory.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
> >---
> >  arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
> >  arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
> >  arch/arm64/kvm/sys_regs.c            |  6 ++--
> >  3 files changed, 60 insertions(+), 43 deletions(-)
> >
> 
> [...]
> 
> >diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> >index bbc6ae32e4af..3f65098aff8d 100644
> >--- a/arch/arm64/kvm/regmap.c
> >+++ b/arch/arm64/kvm/regmap.c
> >@@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
> >  /*
> >   * Return the SPSR for the current mode of the virtual CPU.
> >   */
> >-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
> >+static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
> >  	switch (mode) {
> >-	case COMPAT_PSR_MODE_SVC:
> >-		mode = KVM_SPSR_SVC;
> >-		break;
> >-	case COMPAT_PSR_MODE_ABT:
> >-		mode = KVM_SPSR_ABT;
> >-		break;
> >-	case COMPAT_PSR_MODE_UND:
> >-		mode = KVM_SPSR_UND;
> >-		break;
> >-	case COMPAT_PSR_MODE_IRQ:
> >-		mode = KVM_SPSR_IRQ;
> >-		break;
> >-	case COMPAT_PSR_MODE_FIQ:
> >-		mode = KVM_SPSR_FIQ;
> >-		break;
> >+	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
> >+	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
> >+	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
> >+	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
> >+	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
> >+	default: BUG();
> >+	}
> >+}
> >+
> >+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
> >+{
> >+	int spsr_idx = vcpu_spsr32_mode(vcpu);
> >+
> >+	if (!vcpu->arch.sysregs_loaded_on_cpu)
> >+		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
> >+
> >+	switch (spsr_idx) {
> >+	case KVM_SPSR_SVC:
> >+		return read_sysreg_el1(spsr);
> >+	case KVM_SPSR_ABT:
> >+		return read_sysreg(spsr_abt);
> >+	case KVM_SPSR_UND:
> >+		return read_sysreg(spsr_und);
> >+	case KVM_SPSR_IRQ:
> >+		return read_sysreg(spsr_irq);
> >+	case KVM_SPSR_FIQ:
> >+		return read_sysreg(spsr_fiq);
> >  	default:
> >  		BUG();
> 
> Nit:
> 
> Since the BUG() is in vcpu_spsr32_mode now, you can probably remove it here
> (or add it to vcpu_write_sprsr32 for consistency).
> 
> >  	}

Yes, I'll remove it.

Thanks,
-Christoffer

> >+}
> >-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
> >+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
> >+{
> >+	int spsr_idx = vcpu_spsr32_mode(vcpu);
> >+
> >+	if (!vcpu->arch.sysregs_loaded_on_cpu)
> >+		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
> >+
> >+	switch (spsr_idx) {
> >+	case KVM_SPSR_SVC:
> >+		write_sysreg_el1(v, spsr);
> >+	case KVM_SPSR_ABT:
> >+		write_sysreg(v, spsr_abt);
> >+	case KVM_SPSR_UND:
> >+		write_sysreg(v, spsr_und);
> >+	case KVM_SPSR_IRQ:
> >+		write_sysreg(v, spsr_irq);
> >+	case KVM_SPSR_FIQ:
> >+		write_sysreg(v, spsr_fiq);
> >+	}
> >  }

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers
@ 2018-01-18 13:12       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 17, 2018 at 06:22:29PM +0000, Julien Thierry wrote:
> Hi,
> 
> On 12/01/18 12:07, Christoffer Dall wrote:
> >32-bit registers are not used by a 64-bit host kernel and can be
> >deferred, but we need to rework the accesses to this register to access
> >the latest value depending on whether or not guest system registers are
> >loaded on the CPU or only reside in memory.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> 
> >---
> >  arch/arm64/include/asm/kvm_emulate.h | 32 +++++-------------
> >  arch/arm64/kvm/regmap.c              | 65 ++++++++++++++++++++++++++----------
> >  arch/arm64/kvm/sys_regs.c            |  6 ++--
> >  3 files changed, 60 insertions(+), 43 deletions(-)
> >
> 
> [...]
> 
> >diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> >index bbc6ae32e4af..3f65098aff8d 100644
> >--- a/arch/arm64/kvm/regmap.c
> >+++ b/arch/arm64/kvm/regmap.c
> >@@ -141,28 +141,59 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
> >  /*
> >   * Return the SPSR for the current mode of the virtual CPU.
> >   */
> >-unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu)
> >+static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long mode = *vcpu_cpsr(vcpu) & COMPAT_PSR_MODE_MASK;
> >  	switch (mode) {
> >-	case COMPAT_PSR_MODE_SVC:
> >-		mode = KVM_SPSR_SVC;
> >-		break;
> >-	case COMPAT_PSR_MODE_ABT:
> >-		mode = KVM_SPSR_ABT;
> >-		break;
> >-	case COMPAT_PSR_MODE_UND:
> >-		mode = KVM_SPSR_UND;
> >-		break;
> >-	case COMPAT_PSR_MODE_IRQ:
> >-		mode = KVM_SPSR_IRQ;
> >-		break;
> >-	case COMPAT_PSR_MODE_FIQ:
> >-		mode = KVM_SPSR_FIQ;
> >-		break;
> >+	case COMPAT_PSR_MODE_SVC: return KVM_SPSR_SVC;
> >+	case COMPAT_PSR_MODE_ABT: return KVM_SPSR_ABT;
> >+	case COMPAT_PSR_MODE_UND: return KVM_SPSR_UND;
> >+	case COMPAT_PSR_MODE_IRQ: return KVM_SPSR_IRQ;
> >+	case COMPAT_PSR_MODE_FIQ: return KVM_SPSR_FIQ;
> >+	default: BUG();
> >+	}
> >+}
> >+
> >+unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
> >+{
> >+	int spsr_idx = vcpu_spsr32_mode(vcpu);
> >+
> >+	if (!vcpu->arch.sysregs_loaded_on_cpu)
> >+		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
> >+
> >+	switch (spsr_idx) {
> >+	case KVM_SPSR_SVC:
> >+		return read_sysreg_el1(spsr);
> >+	case KVM_SPSR_ABT:
> >+		return read_sysreg(spsr_abt);
> >+	case KVM_SPSR_UND:
> >+		return read_sysreg(spsr_und);
> >+	case KVM_SPSR_IRQ:
> >+		return read_sysreg(spsr_irq);
> >+	case KVM_SPSR_FIQ:
> >+		return read_sysreg(spsr_fiq);
> >  	default:
> >  		BUG();
> 
> Nit:
> 
> Since the BUG() is in vcpu_spsr32_mode now, you can probably remove it here
> (or add it to vcpu_write_sprsr32 for consistency).
> 
> >  	}

Yes, I'll remove it.

Thanks,
-Christoffer

> >+}
> >-	return (unsigned long *)&vcpu_gp_regs(vcpu)->spsr[mode];
> >+void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
> >+{
> >+	int spsr_idx = vcpu_spsr32_mode(vcpu);
> >+
> >+	if (!vcpu->arch.sysregs_loaded_on_cpu)
> >+		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
> >+
> >+	switch (spsr_idx) {
> >+	case KVM_SPSR_SVC:
> >+		write_sysreg_el1(v, spsr);
> >+	case KVM_SPSR_ABT:
> >+		write_sysreg(v, spsr_abt);
> >+	case KVM_SPSR_UND:
> >+		write_sysreg(v, spsr_und);
> >+	case KVM_SPSR_IRQ:
> >+		write_sysreg(v, spsr_irq);
> >+	case KVM_SPSR_FIQ:
> >+		write_sysreg(v, spsr_fiq);
> >+	}
> >  }

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-18 12:18       ` Yury Norov
@ 2018-01-18 13:32         ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:32 UTC (permalink / raw)
  To: Yury Norov
  Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier, Shih-Wei Li,
	Andrew Jones, Sunil Goutham, Alex Bennee

On Thu, Jan 18, 2018 at 03:18:21PM +0300, Yury Norov wrote:
> On Thu, Jan 18, 2018 at 12:16:32PM +0100, Christoffer Dall wrote:
> > Hi Yury,
> > 
> > [cc'ing Alex Bennee who had some thoughts on this]
> > 
> > On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > > VHE systems.  The general approach is to try to do as little work as
> > > > possible when transitioning between the VM and the hypervisor.  This has
> > > > the benefit of lower latency when waiting for interrupts and delivering
> > > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > > I/O in the host kernel.
> > > > 
> > > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > > that can be generally improved.  We then add infrastructure to move more
> > > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > > registers.
> > > > 
> > > > We then introduce a new world-switch function for VHE systems, which we
> > > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > > the system register save/restore handling and emulation code that may
> > > > need access to system registers, so that we can defer as many system
> > > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > > this logic out of the VHE world switch function.
> > > > 
> > > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > > the host and VM kernels run in EL1, but because the host kernel should
> > > > have full access to the underlying hardware, but the VM kernel should
> > > > not, we essentially make the host kernel more privileged than the VM
> > > > kernel despite them both running at the same privilege level by enabling
> > > > VE traps when entering the VM and disabling those traps when exiting the
> > > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > > the hardware (as much as allowed by secure side software), and is
> > > > unaffected by the trap configuration.  That means we can configure the
> > > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > > off for every entry/exit to/from the VM.
> > > > 
> > > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > > out of the VHE world-switch, and we make it possible to truly only
> > > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > > the case, and only do the minimal amount of work required in the course
> > > > of the VGIC processing when we have virtual interrupts in flight.
> > > > 
> > > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > > interrupts support series [1], and the first five patches of James' SDEI
> > > > series [2].
> > > > 
> > > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > > functionality on the Foundation model, running both 64-bit VMs and
> > > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > > GICv2-on-GICv3.
> > > > 
> > > > The patches are also available in the vhe-optimize-v3 branch on my
> > > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > > prerequisites of this series.
> > > > 
> > > > Changes since v2:
> > > >  - Rebased on v4.15-rc3.
> > > >  - Includes two additional patches that only does vcpu_load after
> > > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > > >  - Addressed review comments from v2 (detailed changelogs are in the
> > > >    individual patches).
> > > > 
> > > > Thanks,
> > > > -Christoffer
> > > > 
> > > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > > 
> > > I tested this v3 series on ThunderX2 with IPI benchmark:
> > > https://lkml.org/lkml/2017/12/11/364
> > > 
> > > I tried to address your comments in discussion to v2, like pinning
> > > the module to specific CPU (with taskset), increasing the number of
> > > iterations, tuning governor to max performance. Results didn't change
> > > much, and are pretty stable.
> > > 
> > > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > > the acknowledge time is much faster for v3, so overall time to
> > > deliver and acknowledge IPI (2nd column) is less than vanilla
> > > 4.15-rc3 kernel.
> > > 
> > > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > > guest is running under qemu-kvm, emulating gic version 3.
> > > 
> > > Below is test results for v1-3 normalized to host vanilla kernel
> > > dry-run time.
> > > 
> > > Yury
> > > 
> > > Host, v4.14:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:      81       110
> > > Broadcast IPI:    0      2106
> > > 
> > > Guest, v4.14:
> > > Dry-run:          0         1
> > > Self-IPI:        10        18
> > > Normal IPI:     305       525
> > > Broadcast IPI:    0      9729
> > > 
> > > Guest, v4.14 + VHE:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:     176       343
> > > Broadcast IPI:    0      9885
> > > 
> > > And for v2.
> > > 
> > > Host, v4.15:                   
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:      79       108
> > > Broadcast IPI:    0      2102
> > >                         
> > > Guest, v4.15-rc:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:     291       526
> > > Broadcast IPI:    0     10439
> > > 
> > > Guest, v4.15-rc + VHE:
> > > Dry-run:          0         2
> > > Self-IPI:        14        28
> > > Normal IPI:     370       569
> > > Broadcast IPI:    0     11688
> > > 
> > > And for v3.
> > > 
> > > Host 4.15-rc3					
> > > Dry-run:	  0	    1
> > > Self-IPI:	  9	   18
> > > Normal IPI:	 80	  110
> > > Broadcast IPI:	  0	 2088
> > > 		
> > > Guest, 4.15-rc3	
> > > Dry-run:	  0	    1
> > > Self-IPI:	  9	   18
> > > Normal IPI:	289	  497
> > > Broadcast IPI:	  0	 9999
> > > 		
> > > Guest, 4.15-rc3	+ VHE
> > > Dry-run:	  0	    2
> > > Self-IPI:	 12	   24
> > > Normal IPI:	347	  490
> > > Broadcast IPI:	  0	11906
> > 
> > So, I had a look at your measurement code, and just want to make a
> > sanity check that I understand the measurements correctly.
> > 
> > Firstly, if we execute something 100,000 times and summarize the result
> > for each run, and get anything less than 100,000 (in this case ~300),
> > without scaling the value, doesn't that mean that in the vast majority
> > of cases, you are getting 0 as your measurement?
> 
> I cannot report absolute numbers so I posted normalized values to dry-run
> case. 300 for IPI delivery means that it 300 times slower than no-op
> (dry-run case). Absolute numbers looks quite reasonable, few useconds
> for normal IPI.

Ah, I see, you normalized it after the output from your benchmark.  I
thought you normalized it in the benchmark code originally, but then I
didn't see it in the patch you linked to, so wasn't sure what was going
on.

> 
> Let me know if you need absolute numbers.
> https://lkml.org/lkml/2017/12/13/301
>  

I trust you, that's fine.

> > Secondly, are we sure all the required memory barriers are in place?
> > I know that the IPI send contains an smp_wmb(), but when you read back
> > the value in the caller, do you have the necessary smp_wmb() on the
> > handler side and a corresponding smp_rmb() on the sending side?  I'm not
> > sure what kind of effect missing barriers for a measurement framework
> > like this would have, but it's worth making sure we're not chasing red
> > herrings here.
> 
> I don't share memory between PMUs. 

PMUs?

You do share memory between your CPUs, it's the little piece of memory
that your time variable points to.

I was concerned if the read back from your sender CPU of the value
written by the receiving CPU was properly ordered, but looking at
handle_IPI and smp_call_function_single, there are barriers pretty much
all over, and I don't think a missing barrier would result in what we
see here (given that I understand the normalization above).


>  
> > That obviously doesn't change that the overall turnaround time is
> > improved more in the v1 case than in the v3 case, which I'd like to
> > explore/bisect in any case.
> 
> So me. For any idea, let me know, I'll check it.
> 
So another thing that would be very useful (which I would do myself if I
had access to a TX2) would be to simply bisect the series and run
the benchmark and see where the regression is introduced.

In case you have time for that, I have a bisectable series with the
recent KVM/ARM fixes in the 'vhe-optimize-v3-with-fixes' branch on:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-18 13:32         ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-18 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 18, 2018 at 03:18:21PM +0300, Yury Norov wrote:
> On Thu, Jan 18, 2018 at 12:16:32PM +0100, Christoffer Dall wrote:
> > Hi Yury,
> > 
> > [cc'ing Alex Bennee who had some thoughts on this]
> > 
> > On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:
> > > On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
> > > > This series redesigns parts of KVM/ARM to optimize the performance on
> > > > VHE systems.  The general approach is to try to do as little work as
> > > > possible when transitioning between the VM and the hypervisor.  This has
> > > > the benefit of lower latency when waiting for interrupts and delivering
> > > > virtual interrupts, and reduces the overhead of emulating behavior and
> > > > I/O in the host kernel.
> > > > 
> > > > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > > > that can be generally improved.  We then add infrastructure to move more
> > > > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > > > registers.
> > > > 
> > > > We then introduce a new world-switch function for VHE systems, which we
> > > > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > > > the system register save/restore handling and emulation code that may
> > > > need access to system registers, so that we can defer as many system
> > > > register save/restore operations to vcpu_load and vcpu_put, and move
> > > > this logic out of the VHE world switch function.
> > > > 
> > > > We then optimize the configuration of traps.  On non-VHE systems, both
> > > > the host and VM kernels run in EL1, but because the host kernel should
> > > > have full access to the underlying hardware, but the VM kernel should
> > > > not, we essentially make the host kernel more privileged than the VM
> > > > kernel despite them both running at the same privilege level by enabling
> > > > VE traps when entering the VM and disabling those traps when exiting the
> > > > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > > > the hardware (as much as allowed by secure side software), and is
> > > > unaffected by the trap configuration.  That means we can configure the
> > > > traps for VMs running in EL1 once, and don't have to switch them on and
> > > > off for every entry/exit to/from the VM.
> > > > 
> > > > Finally, we improve our VGIC handling by moving all save/restore logic
> > > > out of the VHE world-switch, and we make it possible to truly only
> > > > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > > > the case, and only do the minimal amount of work required in the course
> > > > of the VGIC processing when we have virtual interrupts in flight.
> > > > 
> > > > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > > > interrupts support series [1], and the first five patches of James' SDEI
> > > > series [2].
> > > > 
> > > > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > > > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > > > functionality on the Foundation model, running both 64-bit VMs and
> > > > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > > > GICv2-on-GICv3.
> > > > 
> > > > The patches are also available in the vhe-optimize-v3 branch on my
> > > > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > > > prerequisites of this series.
> > > > 
> > > > Changes since v2:
> > > >  - Rebased on v4.15-rc3.
> > > >  - Includes two additional patches that only does vcpu_load after
> > > >    kvm_vcpu_first_run_init and only for KVM_RUN.
> > > >  - Addressed review comments from v2 (detailed changelogs are in the
> > > >    individual patches).
> > > > 
> > > > Thanks,
> > > > -Christoffer
> > > > 
> > > > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > > > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > > > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > > 
> > > I tested this v3 series on ThunderX2 with IPI benchmark:
> > > https://lkml.org/lkml/2017/12/11/364
> > > 
> > > I tried to address your comments in discussion to v2, like pinning
> > > the module to specific CPU (with taskset), increasing the number of
> > > iterations, tuning governor to max performance. Results didn't change
> > > much, and are pretty stable.
> > > 
> > > Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> > > For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> > > the acknowledge time is much faster for v3, so overall time to
> > > deliver and acknowledge IPI (2nd column) is less than vanilla
> > > 4.15-rc3 kernel.
> > > 
> > > Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> > > guest is running under qemu-kvm, emulating gic version 3.
> > > 
> > > Below is test results for v1-3 normalized to host vanilla kernel
> > > dry-run time.
> > > 
> > > Yury
> > > 
> > > Host, v4.14:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:      81       110
> > > Broadcast IPI:    0      2106
> > > 
> > > Guest, v4.14:
> > > Dry-run:          0         1
> > > Self-IPI:        10        18
> > > Normal IPI:     305       525
> > > Broadcast IPI:    0      9729
> > > 
> > > Guest, v4.14 + VHE:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:     176       343
> > > Broadcast IPI:    0      9885
> > > 
> > > And for v2.
> > > 
> > > Host, v4.15:                   
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:      79       108
> > > Broadcast IPI:    0      2102
> > >                         
> > > Guest, v4.15-rc:
> > > Dry-run:          0         1
> > > Self-IPI:         9        18
> > > Normal IPI:     291       526
> > > Broadcast IPI:    0     10439
> > > 
> > > Guest, v4.15-rc + VHE:
> > > Dry-run:          0         2
> > > Self-IPI:        14        28
> > > Normal IPI:     370       569
> > > Broadcast IPI:    0     11688
> > > 
> > > And for v3.
> > > 
> > > Host 4.15-rc3					
> > > Dry-run:	  0	    1
> > > Self-IPI:	  9	   18
> > > Normal IPI:	 80	  110
> > > Broadcast IPI:	  0	 2088
> > > 		
> > > Guest, 4.15-rc3	
> > > Dry-run:	  0	    1
> > > Self-IPI:	  9	   18
> > > Normal IPI:	289	  497
> > > Broadcast IPI:	  0	 9999
> > > 		
> > > Guest, 4.15-rc3	+ VHE
> > > Dry-run:	  0	    2
> > > Self-IPI:	 12	   24
> > > Normal IPI:	347	  490
> > > Broadcast IPI:	  0	11906
> > 
> > So, I had a look at your measurement code, and just want to make a
> > sanity check that I understand the measurements correctly.
> > 
> > Firstly, if we execute something 100,000 times and summarize the result
> > for each run, and get anything less than 100,000 (in this case ~300),
> > without scaling the value, doesn't that mean that in the vast majority
> > of cases, you are getting 0 as your measurement?
> 
> I cannot report absolute numbers so I posted normalized values to dry-run
> case. 300 for IPI delivery means that it 300 times slower than no-op
> (dry-run case). Absolute numbers looks quite reasonable, few useconds
> for normal IPI.

Ah, I see, you normalized it after the output from your benchmark.  I
thought you normalized it in the benchmark code originally, but then I
didn't see it in the patch you linked to, so wasn't sure what was going
on.

> 
> Let me know if you need absolute numbers.
> https://lkml.org/lkml/2017/12/13/301
>  

I trust you, that's fine.

> > Secondly, are we sure all the required memory barriers are in place?
> > I know that the IPI send contains an smp_wmb(), but when you read back
> > the value in the caller, do you have the necessary smp_wmb() on the
> > handler side and a corresponding smp_rmb() on the sending side?  I'm not
> > sure what kind of effect missing barriers for a measurement framework
> > like this would have, but it's worth making sure we're not chasing red
> > herrings here.
> 
> I don't share memory between PMUs. 

PMUs?

You do share memory between your CPUs, it's the little piece of memory
that your time variable points to.

I was concerned if the read back from your sender CPU of the value
written by the receiving CPU was properly ordered, but looking at
handle_IPI and smp_call_function_single, there are barriers pretty much
all over, and I don't think a missing barrier would result in what we
see here (given that I understand the normalization above).


>  
> > That obviously doesn't change that the overall turnaround time is
> > improved more in the v1 case than in the v3 case, which I'd like to
> > explore/bisect in any case.
> 
> So me. For any idea, let me know, I'll check it.
> 
So another thing that would be very useful (which I would do myself if I
had access to a TX2) would be to simply bisect the series and run
the benchmark and see where the regression is introduced.

In case you have time for that, I have a bisectable series with the
recent KVM/ARM fixes in the 'vhe-optimize-v3-with-fixes' branch on:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-18 13:08       ` Christoffer Dall
@ 2018-01-18 13:39         ` Julien Thierry
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-18 13:39 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li



On 18/01/18 13:08, Christoffer Dall wrote:
> On Wed, Jan 17, 2018 at 05:52:21PM +0000, Julien Thierry wrote:
>>
>>
>> On 12/01/18 12:07, Christoffer Dall wrote:
>>> We are about to defer saving and restoring some groups of system
>>> registers to vcpu_put and vcpu_load on supported systems.  This means
>>> that we need some infrastructure to access system registes which
>>> supports either accessing the memory backing of the register or directly
>>> accessing the system registers, depending on the state of the system
>>> when we access the register.
>>>
>>> We do this by defining a set of read/write accessors for each system
>>> register, and letting each system register be defined as "immediate" or
>>> "deferrable".  Immediate registers are always saved/restored in the
>>> world-switch path, but deferrable registers are only saved/restored in
>>> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
>>> in that case.
>>>
>>
>> The patch is fine, however I'd suggest adding a comment in the pointing out
>> that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu struct.
>> Instinctively I would expect the deferrable/immediate to apply to the actual
>> hardware register access, so a comment would prevent people like me to get
>> on the wrong track.
>>
> 
> I tried to explain that a bit in the first sentence of the commit
> message, but I can try to make it more clear that we introduce
> terminology.
> 

The commit message is fine, I just think it would be nice to have it in 
the code so you don't have to look for the commit to understand.

>>> Not that we don't use the deferred mechanism yet in this patch, but only
>>> introduce infrastructure.  This is to improve convenience of review in
>>> the subsequent patches where it is clear which registers become
>>> deferred.
>>>
>>>   [ Most of this logic was contributed by Marc Zyngier ]
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
>>
>>> ---
>>>   arch/arm64/include/asm/kvm_host.h |   8 +-
>>>   arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 166 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 91272c35cc36..4b5ef82f6bdb 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>>>   	/* Detect first run of a vcpu */
>>>   	bool has_run_once;
>>> +
>>> +	/* True when deferrable sysregs are loaded on the physical CPU,
>>> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
>>> +	bool sysregs_loaded_on_cpu;
>>>   };
>>>   #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>>> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>>>    */
>>>   #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>>> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
>>> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
>>> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
>>> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>>>   /*
>>>    * CP14 and CP15 live in the same array, as they are backed by the
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 96398d53b462..9d353a6a55c9 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -35,6 +35,7 @@
>>>   #include <asm/kvm_coproc.h>
>>>   #include <asm/kvm_emulate.h>
>>>   #include <asm/kvm_host.h>
>>> +#include <asm/kvm_hyp.h>
>>>   #include <asm/kvm_mmu.h>
>>>   #include <asm/perf_event.h>
>>>   #include <asm/sysreg.h>
>>> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>>   	return false;
>>>   }
>>> +struct sys_reg_accessor {
>>> +	u64	(*rdsr)(struct kvm_vcpu *, int);
>>> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
>>
>> Nit:
>>
>> Why use a signed integer for the register index argument?
>>
> 
> The type name is short? ;)  No particular reason, could be an unsigned
> int, but I don't think it matters here does it?
> 

Probably not, just personal preference I guess.

>>> +};
>>> +
>>> +#define DECLARE_IMMEDIATE_SR(i)						\
>>> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
>>> +	{								\
>>> +		return __vcpu_sys_reg(vcpu, r);				\
>>> +	}								\
>>> +									\
>>> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
>>> +	{								\
>>> +		__vcpu_sys_reg(vcpu, r) = v;				\
>>> +	}								\
>>> +
>>> +#define DECLARE_DEFERRABLE_SR(i, s)					\
>>> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
>>> +	{								\
>>> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
>>> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
>>> +			return read_sysreg_s((s));			\
>>> +		}							\
>>> +		return __vcpu_sys_reg(vcpu, r);				\
>>> +	}								\
>>> +									\
>>> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
>>> +	{								\
>>> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
>>> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
>>> +			write_sysreg_s(v, (s));				\
>>> +		} else {						\
>>> +			__vcpu_sys_reg(vcpu, r) = v;			\
>>> +		}							\
>>> +	}								\
>>> +
>>> +
>>> +#define SR_HANDLER_RANGE(i,e)						\
>>> +	[i ... e] =  (struct sys_reg_accessor) {			\
>>> +		.rdsr = __##i##_read,					\
>>> +		.wrsr = __##i##_write,					\
>>
>> Nit:
>> Could we have __vcpu_##i##_read and __vcpu_##i##_write?
>>
> 
> They don't necessarily read from the vcpu do they?
> 

Hmmm, from my understanding they do, but the action the vcpu can be 
immediate or deferred. But from the semantics you give to 
IMMEDIATE/DEFERRABLE, I'd say the actions are related to vcpu.

I don't know if that makes sense.

> Unrelated: I also thought about just having a single function a switch
> statement instead, which may make it easier to follow the code as there
> would be no macros generating functions, but it would be slightly less
> declarative.
> 
> For example:
> 
> u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> {
> 	if (!vcpu->arch.sysregs_loaded_on_cpu)
> 		goto immediate_read;
> 	
> 	/*
> 	 * All system registers listed in the switch are deferred
> 	 * save/restored on VHE systems.
> 	 */
> 	switch (reg) {
> 	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1));
> 	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12));
> 	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1));
> 	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12));
> 	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12));
> 	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12));
> 	case TCR_EL1:		return read_sysreg_s(tcr_EL12));
> 	case ESR_EL1:		return read_sysreg_s(esr_EL12));
> 	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12));
> 	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12));
> 	case FAR_EL1:		return read_sysreg_s(far_EL12));
> 	case MAIR_EL1:		return read_sysreg_s(mair_EL12));
> 	case VBAR_EL1:		return read_sysreg_s(vbar_EL12));
> 	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12));
> 	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0));
> 	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0));
> 	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1));
> 	case AMAIR_EL1:		return read_sysreg_s(amair_EL12));
> 	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12));
> 	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1));
> 	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2));
> 	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2));
> 	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2));
> 	}
> 
> immediate_read:
> 	return __vcpu_sys_reg(vcpu, reg);
> }
> 
> Since you're having a look at this, what are your thoughts?

I like that suggestion, very easy to follow. Of course negative side is 
that you'll need two of those switch... But yes I still prefer this new 
suggestion.

So if you go with this you can ignore my other comments ;) .

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-18 13:39         ` Julien Thierry
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Thierry @ 2018-01-18 13:39 UTC (permalink / raw)
  To: linux-arm-kernel



On 18/01/18 13:08, Christoffer Dall wrote:
> On Wed, Jan 17, 2018 at 05:52:21PM +0000, Julien Thierry wrote:
>>
>>
>> On 12/01/18 12:07, Christoffer Dall wrote:
>>> We are about to defer saving and restoring some groups of system
>>> registers to vcpu_put and vcpu_load on supported systems.  This means
>>> that we need some infrastructure to access system registes which
>>> supports either accessing the memory backing of the register or directly
>>> accessing the system registers, depending on the state of the system
>>> when we access the register.
>>>
>>> We do this by defining a set of read/write accessors for each system
>>> register, and letting each system register be defined as "immediate" or
>>> "deferrable".  Immediate registers are always saved/restored in the
>>> world-switch path, but deferrable registers are only saved/restored in
>>> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
>>> in that case.
>>>
>>
>> The patch is fine, however I'd suggest adding a comment in the pointing out
>> that the IMMEDIATE/DEFERRABLE apply to save/restore to the vcpu struct.
>> Instinctively I would expect the deferrable/immediate to apply to the actual
>> hardware register access, so a comment would prevent people like me to get
>> on the wrong track.
>>
> 
> I tried to explain that a bit in the first sentence of the commit
> message, but I can try to make it more clear that we introduce
> terminology.
> 

The commit message is fine, I just think it would be nice to have it in 
the code so you don't have to look for the commit to understand.

>>> Not that we don't use the deferred mechanism yet in this patch, but only
>>> introduce infrastructure.  This is to improve convenience of review in
>>> the subsequent patches where it is clear which registers become
>>> deferred.
>>>
>>>   [ Most of this logic was contributed by Marc Zyngier ]
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
>>
>>> ---
>>>   arch/arm64/include/asm/kvm_host.h |   8 +-
>>>   arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 166 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 91272c35cc36..4b5ef82f6bdb 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>>>   	/* Detect first run of a vcpu */
>>>   	bool has_run_once;
>>> +
>>> +	/* True when deferrable sysregs are loaded on the physical CPU,
>>> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
>>> +	bool sysregs_loaded_on_cpu;
>>>   };
>>>   #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>>> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>>>    */
>>>   #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>>> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
>>> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
>>> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
>>> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>>>   /*
>>>    * CP14 and CP15 live in the same array, as they are backed by the
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 96398d53b462..9d353a6a55c9 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -35,6 +35,7 @@
>>>   #include <asm/kvm_coproc.h>
>>>   #include <asm/kvm_emulate.h>
>>>   #include <asm/kvm_host.h>
>>> +#include <asm/kvm_hyp.h>
>>>   #include <asm/kvm_mmu.h>
>>>   #include <asm/perf_event.h>
>>>   #include <asm/sysreg.h>
>>> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>>   	return false;
>>>   }
>>> +struct sys_reg_accessor {
>>> +	u64	(*rdsr)(struct kvm_vcpu *, int);
>>> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
>>
>> Nit:
>>
>> Why use a signed integer for the register index argument?
>>
> 
> The type name is short? ;)  No particular reason, could be an unsigned
> int, but I don't think it matters here does it?
> 

Probably not, just personal preference I guess.

>>> +};
>>> +
>>> +#define DECLARE_IMMEDIATE_SR(i)						\
>>> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
>>> +	{								\
>>> +		return __vcpu_sys_reg(vcpu, r);				\
>>> +	}								\
>>> +									\
>>> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
>>> +	{								\
>>> +		__vcpu_sys_reg(vcpu, r) = v;				\
>>> +	}								\
>>> +
>>> +#define DECLARE_DEFERRABLE_SR(i, s)					\
>>> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
>>> +	{								\
>>> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
>>> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
>>> +			return read_sysreg_s((s));			\
>>> +		}							\
>>> +		return __vcpu_sys_reg(vcpu, r);				\
>>> +	}								\
>>> +									\
>>> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
>>> +	{								\
>>> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
>>> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
>>> +			write_sysreg_s(v, (s));				\
>>> +		} else {						\
>>> +			__vcpu_sys_reg(vcpu, r) = v;			\
>>> +		}							\
>>> +	}								\
>>> +
>>> +
>>> +#define SR_HANDLER_RANGE(i,e)						\
>>> +	[i ... e] =  (struct sys_reg_accessor) {			\
>>> +		.rdsr = __##i##_read,					\
>>> +		.wrsr = __##i##_write,					\
>>
>> Nit:
>> Could we have __vcpu_##i##_read and __vcpu_##i##_write?
>>
> 
> They don't necessarily read from the vcpu do they?
> 

Hmmm, from my understanding they do, but the action the vcpu can be 
immediate or deferred. But from the semantics you give to 
IMMEDIATE/DEFERRABLE, I'd say the actions are related to vcpu.

I don't know if that makes sense.

> Unrelated: I also thought about just having a single function a switch
> statement instead, which may make it easier to follow the code as there
> would be no macros generating functions, but it would be slightly less
> declarative.
> 
> For example:
> 
> u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> {
> 	if (!vcpu->arch.sysregs_loaded_on_cpu)
> 		goto immediate_read;
> 	
> 	/*
> 	 * All system registers listed in the switch are deferred
> 	 * save/restored on VHE systems.
> 	 */
> 	switch (reg) {
> 	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1));
> 	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12));
> 	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1));
> 	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12));
> 	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12));
> 	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12));
> 	case TCR_EL1:		return read_sysreg_s(tcr_EL12));
> 	case ESR_EL1:		return read_sysreg_s(esr_EL12));
> 	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12));
> 	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12));
> 	case FAR_EL1:		return read_sysreg_s(far_EL12));
> 	case MAIR_EL1:		return read_sysreg_s(mair_EL12));
> 	case VBAR_EL1:		return read_sysreg_s(vbar_EL12));
> 	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12));
> 	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0));
> 	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0));
> 	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1));
> 	case AMAIR_EL1:		return read_sysreg_s(amair_EL12));
> 	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12));
> 	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1));
> 	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2));
> 	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2));
> 	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2));
> 	}
> 
> immediate_read:
> 	return __vcpu_sys_reg(vcpu, reg);
> }
> 
> Since you're having a look at this, what are your thoughts?

I like that suggestion, very easy to follow. Of course negative side is 
that you'll need two of those switch... But yes I still prefer this new 
suggestion.

So if you go with this you can ignore my other comments ;) .

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-15 14:14   ` Yury Norov
@ 2018-01-22 13:40     ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-22 13:40 UTC (permalink / raw)
  To: Yury Norov, Christoffer Dall
  Cc: Sunil Goutham, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

Hi Yury,

On 15.01.2018 15:14, Yury Norov wrote:
> Hi Christoffer,
> 
> [CC Sunil Goutham <Sunil.Goutham@cavium.com>]
> 
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
>> This series redesigns parts of KVM/ARM to optimize the performance on
>> VHE systems.  The general approach is to try to do as little work as
>> possible when transitioning between the VM and the hypervisor.  This has
>> the benefit of lower latency when waiting for interrupts and delivering
>> virtual interrupts, and reduces the overhead of emulating behavior and
>> I/O in the host kernel.
>>
>> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
>> that can be generally improved.  We then add infrastructure to move more
>> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
>> registers.
>>
>> We then introduce a new world-switch function for VHE systems, which we
>> can tweak and optimize for VHE systems.  To do that, we rework a lot of
>> the system register save/restore handling and emulation code that may
>> need access to system registers, so that we can defer as many system
>> register save/restore operations to vcpu_load and vcpu_put, and move
>> this logic out of the VHE world switch function.
>>
>> We then optimize the configuration of traps.  On non-VHE systems, both
>> the host and VM kernels run in EL1, but because the host kernel should
>> have full access to the underlying hardware, but the VM kernel should
>> not, we essentially make the host kernel more privileged than the VM
>> kernel despite them both running at the same privilege level by enabling
>> VE traps when entering the VM and disabling those traps when exiting the
>> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
>> the hardware (as much as allowed by secure side software), and is
>> unaffected by the trap configuration.  That means we can configure the
>> traps for VMs running in EL1 once, and don't have to switch them on and
>> off for every entry/exit to/from the VM.
>>
>> Finally, we improve our VGIC handling by moving all save/restore logic
>> out of the VHE world-switch, and we make it possible to truly only
>> evaluate if the AP list is empty and not do *any* VGIC work if that is
>> the case, and only do the minimal amount of work required in the course
>> of the VGIC processing when we have virtual interrupts in flight.
>>
>> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
>> interrupts support series [1], and the first five patches of James' SDEI
>> series [2].
>>
>> I've given the patches a fair amount of testing on Thunder-X, Mustang,
>> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
>> functionality on the Foundation model, running both 64-bit VMs and
>> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
>> GICv2-on-GICv3.
>>
>> The patches are also available in the vhe-optimize-v3 branch on my
>> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
>> prerequisites of this series.
>>
>> Changes since v2:
>>   - Rebased on v4.15-rc3.
>>   - Includes two additional patches that only does vcpu_load after
>>     kvm_vcpu_first_run_init and only for KVM_RUN.
>>   - Addressed review comments from v2 (detailed changelogs are in the
>>     individual patches).
>>
>> Thanks,
>> -Christoffer
>>
>> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
>> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
>> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.
> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.
> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                          
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

As I reported here:
https://patchwork.kernel.org/patch/10125537/
this might be because of WFI exits storm. Can you please check KVM exits 
stats for completely idle VM ? Also, wait time from kvm_vcpu_wakeup() 
trace point will be useful. I got lots of these:
kvm_vcpu_wakeup: poll time 0 ns, polling valid

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-01-22 13:40     ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-22 13:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Yury,

On 15.01.2018 15:14, Yury Norov wrote:
> Hi Christoffer,
> 
> [CC Sunil Goutham <Sunil.Goutham@cavium.com>]
> 
> On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:
>> This series redesigns parts of KVM/ARM to optimize the performance on
>> VHE systems.  The general approach is to try to do as little work as
>> possible when transitioning between the VM and the hypervisor.  This has
>> the benefit of lower latency when waiting for interrupts and delivering
>> virtual interrupts, and reduces the overhead of emulating behavior and
>> I/O in the host kernel.
>>
>> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
>> that can be generally improved.  We then add infrastructure to move more
>> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
>> registers.
>>
>> We then introduce a new world-switch function for VHE systems, which we
>> can tweak and optimize for VHE systems.  To do that, we rework a lot of
>> the system register save/restore handling and emulation code that may
>> need access to system registers, so that we can defer as many system
>> register save/restore operations to vcpu_load and vcpu_put, and move
>> this logic out of the VHE world switch function.
>>
>> We then optimize the configuration of traps.  On non-VHE systems, both
>> the host and VM kernels run in EL1, but because the host kernel should
>> have full access to the underlying hardware, but the VM kernel should
>> not, we essentially make the host kernel more privileged than the VM
>> kernel despite them both running at the same privilege level by enabling
>> VE traps when entering the VM and disabling those traps when exiting the
>> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
>> the hardware (as much as allowed by secure side software), and is
>> unaffected by the trap configuration.  That means we can configure the
>> traps for VMs running in EL1 once, and don't have to switch them on and
>> off for every entry/exit to/from the VM.
>>
>> Finally, we improve our VGIC handling by moving all save/restore logic
>> out of the VHE world-switch, and we make it possible to truly only
>> evaluate if the AP list is empty and not do *any* VGIC work if that is
>> the case, and only do the minimal amount of work required in the course
>> of the VGIC processing when we have virtual interrupts in flight.
>>
>> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
>> interrupts support series [1], and the first five patches of James' SDEI
>> series [2].
>>
>> I've given the patches a fair amount of testing on Thunder-X, Mustang,
>> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
>> functionality on the Foundation model, running both 64-bit VMs and
>> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
>> GICv2-on-GICv3.
>>
>> The patches are also available in the vhe-optimize-v3 branch on my
>> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
>> prerequisites of this series.
>>
>> Changes since v2:
>>   - Rebased on v4.15-rc3.
>>   - Includes two additional patches that only does vcpu_load after
>>     kvm_vcpu_first_run_init and only for KVM_RUN.
>>   - Addressed review comments from v2 (detailed changelogs are in the
>>     individual patches).
>>
>> Thanks,
>> -Christoffer
>>
>> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
>> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
>> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> I tested this v3 series on ThunderX2 with IPI benchmark:
> https://lkml.org/lkml/2017/12/11/364
> 
> I tried to address your comments in discussion to v2, like pinning
> the module to specific CPU (with taskset), increasing the number of
> iterations, tuning governor to max performance. Results didn't change
> much, and are pretty stable.
> 
> Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
> For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
> the acknowledge time is much faster for v3, so overall time to
> deliver and acknowledge IPI (2nd column) is less than vanilla
> 4.15-rc3 kernel.
> 
> Test setup is not changed since v2: ThunderX2, 112 online CPUs,
> guest is running under qemu-kvm, emulating gic version 3.
> 
> Below is test results for v1-3 normalized to host vanilla kernel
> dry-run time.
> 
> Yury
> 
> Host, v4.14:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      81       110
> Broadcast IPI:    0      2106
> 
> Guest, v4.14:
> Dry-run:          0         1
> Self-IPI:        10        18
> Normal IPI:     305       525
> Broadcast IPI:    0      9729
> 
> Guest, v4.14 + VHE:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     176       343
> Broadcast IPI:    0      9885
> 
> And for v2.
> 
> Host, v4.15:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:      79       108
> Broadcast IPI:    0      2102
>                          
> Guest, v4.15-rc:
> Dry-run:          0         1
> Self-IPI:         9        18
> Normal IPI:     291       526
> Broadcast IPI:    0     10439
> 
> Guest, v4.15-rc + VHE:
> Dry-run:          0         2
> Self-IPI:        14        28
> Normal IPI:     370       569
> Broadcast IPI:    0     11688
> 
> And for v3.
> 
> Host 4.15-rc3					
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	 80	  110
> Broadcast IPI:	  0	 2088
> 		
> Guest, 4.15-rc3	
> Dry-run:	  0	    1
> Self-IPI:	  9	   18
> Normal IPI:	289	  497
> Broadcast IPI:	  0	 9999
> 		
> Guest, 4.15-rc3	+ VHE
> Dry-run:	  0	    2
> Self-IPI:	 12	   24
> Normal IPI:	347	  490
> Broadcast IPI:	  0	11906

As I reported here:
https://patchwork.kernel.org/patch/10125537/
this might be because of WFI exits storm. Can you please check KVM exits 
stats for completely idle VM ? Also, wait time from kvm_vcpu_wakeup() 
trace point will be useful. I got lots of these:
kvm_vcpu_wakeup: poll time 0 ns, polling valid

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-22 17:33     ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-22 17:33 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

[...]

> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 883a6383cd36..848a46eb33bf 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c

[...]

> @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   */
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> +
> +	/* Restore host FP/SIMD state */
> +	if (vcpu->arch.guest_vfp_loaded) {
> +		if (vcpu_el1_is_32bit(vcpu)) {
> +			kvm_call_hyp(__fpsimd32_save_state,
> +				     kern_hyp_va(guest_ctxt));
> +		}
> +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +		vcpu->arch.guest_vfp_loaded = 0;

Provided we've already marked the host FPSIMD state as dirty on the way
in, we probably don't need to restore it here.

In v4.15, the kvm_fpsimd_flush_cpu_state() call in
kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
it's only done for SVE, since KVM was previously restoring the host
FPSIMD subset of the state anyway, but it could be made unconditional.

For a returning run ioctl, this would have the effect of deferring the
host FPSIMD reload until we return to userspace, which is probably
no more costly since the kernel must check whether to do this in
ret_to_user anyway; OTOH if the vcpu thread was preempted by some
other thread we save the cost of restoring the host state entirely here
... I think.

Ultimately I'd like to go one better and actually treat a vcpu as a
first-class fpsimd context, so that taking an interrupt to the host
and then reentering the guest doesn't cause any reload at all.  But
that feels like too big a step for this series, and there are likely
side-issues I've not thought about yet.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-01-22 17:33     ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-22 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the
> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

[...]

> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 883a6383cd36..848a46eb33bf 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c

[...]

> @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   */
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> +
> +	/* Restore host FP/SIMD state */
> +	if (vcpu->arch.guest_vfp_loaded) {
> +		if (vcpu_el1_is_32bit(vcpu)) {
> +			kvm_call_hyp(__fpsimd32_save_state,
> +				     kern_hyp_va(guest_ctxt));
> +		}
> +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +		vcpu->arch.guest_vfp_loaded = 0;

Provided we've already marked the host FPSIMD state as dirty on the way
in, we probably don't need to restore it here.

In v4.15, the kvm_fpsimd_flush_cpu_state() call in
kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
it's only done for SVE, since KVM was previously restoring the host
FPSIMD subset of the state anyway, but it could be made unconditional.

For a returning run ioctl, this would have the effect of deferring the
host FPSIMD reload until we return to userspace, which is probably
no more costly since the kernel must check whether to do this in
ret_to_user anyway; OTOH if the vcpu thread was preempted by some
other thread we save the cost of restoring the host state entirely here
... I think.

Ultimately I'd like to go one better and actually treat a vcpu as a
first-class fpsimd context, so that taking an interrupt to the host
and then reentering the guest doesn't cause any reload at all.  But
that feels like too big a step for this series, and there are likely
side-issues I've not thought about yet.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-23 16:04     ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-23 16:04 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> We are about to defer saving and restoring some groups of system
> registers to vcpu_put and vcpu_load on supported systems.  This means
> that we need some infrastructure to access system registes which
> supports either accessing the memory backing of the register or directly
> accessing the system registers, depending on the state of the system
> when we access the register.
> 
> We do this by defining a set of read/write accessors for each system
> register, and letting each system register be defined as "immediate" or
> "deferrable".  Immediate registers are always saved/restored in the
> world-switch path, but deferrable registers are only saved/restored in
> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> in that case.
> 
> Not that we don't use the deferred mechanism yet in this patch, but only
> introduce infrastructure.  This is to improve convenience of review in
> the subsequent patches where it is clear which registers become
> deferred.

Might this table-driven approach result in a lot of branch mispredicts,
particularly across load/put boundaries?

If we were to move the whole construct to a header, then it could get
constant-folded at the call site down to the individual reg accessed,
say:

	if (sys_regs_loaded)
		read_sysreg_s(TPIDR_EL0);
	else
		__vcpu_sys_reg(v, TPIDR_EL0);

Where multiple regs are accessed close to each other, the compiler
may be able to specialise the whole sequence for the loaded and !loaded
cases so that there is only one conditional branch.


The individual accessor functions also become unnecessary in this case,
because we wouldn't need to derive function pointers from them any
more.

I don't know how performance would compare in practice though.

I'm also assuming that all calls to these accessors are const-foldable.
If not, relying on inlining would bloat the generated code a lot.

Cheers
---Dave

> 
>  [ Most of this logic was contributed by Marc Zyngier ]
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   8 +-
>  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 166 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 91272c35cc36..4b5ef82f6bdb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>  
>  	/* Detect first run of a vcpu */
>  	bool has_run_once;
> +
> +	/* True when deferrable sysregs are loaded on the physical CPU,
> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> +	bool sysregs_loaded_on_cpu;
>  };
>  
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>   */
>  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>  
> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>  
>  /*
>   * CP14 and CP15 live in the same array, as they are backed by the
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 96398d53b462..9d353a6a55c9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -35,6 +35,7 @@
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/perf_event.h>
>  #include <asm/sysreg.h>
> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +struct sys_reg_accessor {
> +	u64	(*rdsr)(struct kvm_vcpu *, int);
> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> +};
> +
> +#define DECLARE_IMMEDIATE_SR(i)						\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		__vcpu_sys_reg(vcpu, r) = v;				\
> +	}								\
> +
> +#define DECLARE_DEFERRABLE_SR(i, s)					\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			return read_sysreg_s((s));			\
> +		}							\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			write_sysreg_s(v, (s));				\
> +		} else {						\
> +			__vcpu_sys_reg(vcpu, r) = v;			\
> +		}							\
> +	}								\
> +
> +
> +#define SR_HANDLER_RANGE(i,e)						\
> +	[i ... e] =  (struct sys_reg_accessor) {			\
> +		.rdsr = __##i##_read,					\
> +		.wrsr = __##i##_write,					\
> +	}
> +
> +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> +
> +static void bad_sys_reg(int reg)
> +{
> +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> +}
> +
> +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	bad_sys_reg(reg);
> +	return 0;
> +}
> +
> +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	bad_sys_reg(reg);
> +}
> +
> +/* Ordered as in enum vcpu_sysreg */
> +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> +DECLARE_IMMEDIATE_SR(TCR_EL1);
> +DECLARE_IMMEDIATE_SR(ESR_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> +DECLARE_IMMEDIATE_SR(FAR_EL1);
> +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> +DECLARE_IMMEDIATE_SR(PAR_EL1);
> +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> +/* PMEVCNTR30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> +/* PMEVTYPER30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> +
> +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> +	[0 ... NR_SYS_REGS - 1] = {
> +		.rdsr = __default_read_sys_reg,
> +		.wrsr = __default_write_sys_reg,
> +	},
> +
> +	SR_HANDLER(MPIDR_EL1),
> +	SR_HANDLER(CSSELR_EL1),
> +	SR_HANDLER(SCTLR_EL1),
> +	SR_HANDLER(ACTLR_EL1),
> +	SR_HANDLER(CPACR_EL1),
> +	SR_HANDLER(TTBR0_EL1),
> +	SR_HANDLER(TTBR1_EL1),
> +	SR_HANDLER(TCR_EL1),
> +	SR_HANDLER(ESR_EL1),
> +	SR_HANDLER(AFSR0_EL1),
> +	SR_HANDLER(AFSR1_EL1),
> +	SR_HANDLER(FAR_EL1),
> +	SR_HANDLER(MAIR_EL1),
> +	SR_HANDLER(VBAR_EL1),
> +	SR_HANDLER(CONTEXTIDR_EL1),
> +	SR_HANDLER(TPIDR_EL0),
> +	SR_HANDLER(TPIDRRO_EL0),
> +	SR_HANDLER(TPIDR_EL1),
> +	SR_HANDLER(AMAIR_EL1),
> +	SR_HANDLER(CNTKCTL_EL1),
> +	SR_HANDLER(PAR_EL1),
> +	SR_HANDLER(MDSCR_EL1),
> +	SR_HANDLER(MDCCINT_EL1),
> +	SR_HANDLER(PMCR_EL0),
> +	SR_HANDLER(PMSELR_EL0),
> +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> +	SR_HANDLER(PMCCNTR_EL0),
> +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> +	SR_HANDLER(PMCCFILTR_EL0),
> +	SR_HANDLER(PMCNTENSET_EL0),
> +	SR_HANDLER(PMINTENSET_EL1),
> +	SR_HANDLER(PMOVSSET_EL0),
> +	SR_HANDLER(PMSWINC_EL0),
> +	SR_HANDLER(PMUSERENR_EL0),
> +	SR_HANDLER(DACR32_EL2),
> +	SR_HANDLER(IFSR32_EL2),
> +	SR_HANDLER(FPEXC32_EL2),
> +	SR_HANDLER(DBGVCR32_EL2),
> +};
> +
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> +}
> +
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> +}
> +
>  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
>  static u32 cache_levels;
>  
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-23 16:04     ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-23 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> We are about to defer saving and restoring some groups of system
> registers to vcpu_put and vcpu_load on supported systems.  This means
> that we need some infrastructure to access system registes which
> supports either accessing the memory backing of the register or directly
> accessing the system registers, depending on the state of the system
> when we access the register.
> 
> We do this by defining a set of read/write accessors for each system
> register, and letting each system register be defined as "immediate" or
> "deferrable".  Immediate registers are always saved/restored in the
> world-switch path, but deferrable registers are only saved/restored in
> vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> in that case.
> 
> Not that we don't use the deferred mechanism yet in this patch, but only
> introduce infrastructure.  This is to improve convenience of review in
> the subsequent patches where it is clear which registers become
> deferred.

Might this table-driven approach result in a lot of branch mispredicts,
particularly across load/put boundaries?

If we were to move the whole construct to a header, then it could get
constant-folded at the call site down to the individual reg accessed,
say:

	if (sys_regs_loaded)
		read_sysreg_s(TPIDR_EL0);
	else
		__vcpu_sys_reg(v, TPIDR_EL0);

Where multiple regs are accessed close to each other, the compiler
may be able to specialise the whole sequence for the loaded and !loaded
cases so that there is only one conditional branch.


The individual accessor functions also become unnecessary in this case,
because we wouldn't need to derive function pointers from them any
more.

I don't know how performance would compare in practice though.

I'm also assuming that all calls to these accessors are const-foldable.
If not, relying on inlining would bloat the generated code a lot.

Cheers
---Dave

> 
>  [ Most of this logic was contributed by Marc Zyngier ]
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   8 +-
>  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 166 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 91272c35cc36..4b5ef82f6bdb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
>  
>  	/* Detect first run of a vcpu */
>  	bool has_run_once;
> +
> +	/* True when deferrable sysregs are loaded on the physical CPU,
> +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> +	bool sysregs_loaded_on_cpu;
>  };
>  
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
>   */
>  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
>  
> -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
>  
>  /*
>   * CP14 and CP15 live in the same array, as they are backed by the
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 96398d53b462..9d353a6a55c9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -35,6 +35,7 @@
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/perf_event.h>
>  #include <asm/sysreg.h>
> @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +struct sys_reg_accessor {
> +	u64	(*rdsr)(struct kvm_vcpu *, int);
> +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> +};
> +
> +#define DECLARE_IMMEDIATE_SR(i)						\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		__vcpu_sys_reg(vcpu, r) = v;				\
> +	}								\
> +
> +#define DECLARE_DEFERRABLE_SR(i, s)					\
> +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			return read_sysreg_s((s));			\
> +		}							\
> +		return __vcpu_sys_reg(vcpu, r);				\
> +	}								\
> +									\
> +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> +	{								\
> +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> +			write_sysreg_s(v, (s));				\
> +		} else {						\
> +			__vcpu_sys_reg(vcpu, r) = v;			\
> +		}							\
> +	}								\
> +
> +
> +#define SR_HANDLER_RANGE(i,e)						\
> +	[i ... e] =  (struct sys_reg_accessor) {			\
> +		.rdsr = __##i##_read,					\
> +		.wrsr = __##i##_write,					\
> +	}
> +
> +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> +
> +static void bad_sys_reg(int reg)
> +{
> +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> +}
> +
> +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	bad_sys_reg(reg);
> +	return 0;
> +}
> +
> +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	bad_sys_reg(reg);
> +}
> +
> +/* Ordered as in enum vcpu_sysreg */
> +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> +DECLARE_IMMEDIATE_SR(TCR_EL1);
> +DECLARE_IMMEDIATE_SR(ESR_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> +DECLARE_IMMEDIATE_SR(FAR_EL1);
> +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> +DECLARE_IMMEDIATE_SR(PAR_EL1);
> +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> +/* PMEVCNTR30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> +/* PMEVTYPER30_EL0 */
> +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> +
> +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> +	[0 ... NR_SYS_REGS - 1] = {
> +		.rdsr = __default_read_sys_reg,
> +		.wrsr = __default_write_sys_reg,
> +	},
> +
> +	SR_HANDLER(MPIDR_EL1),
> +	SR_HANDLER(CSSELR_EL1),
> +	SR_HANDLER(SCTLR_EL1),
> +	SR_HANDLER(ACTLR_EL1),
> +	SR_HANDLER(CPACR_EL1),
> +	SR_HANDLER(TTBR0_EL1),
> +	SR_HANDLER(TTBR1_EL1),
> +	SR_HANDLER(TCR_EL1),
> +	SR_HANDLER(ESR_EL1),
> +	SR_HANDLER(AFSR0_EL1),
> +	SR_HANDLER(AFSR1_EL1),
> +	SR_HANDLER(FAR_EL1),
> +	SR_HANDLER(MAIR_EL1),
> +	SR_HANDLER(VBAR_EL1),
> +	SR_HANDLER(CONTEXTIDR_EL1),
> +	SR_HANDLER(TPIDR_EL0),
> +	SR_HANDLER(TPIDRRO_EL0),
> +	SR_HANDLER(TPIDR_EL1),
> +	SR_HANDLER(AMAIR_EL1),
> +	SR_HANDLER(CNTKCTL_EL1),
> +	SR_HANDLER(PAR_EL1),
> +	SR_HANDLER(MDSCR_EL1),
> +	SR_HANDLER(MDCCINT_EL1),
> +	SR_HANDLER(PMCR_EL0),
> +	SR_HANDLER(PMSELR_EL0),
> +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> +	SR_HANDLER(PMCCNTR_EL0),
> +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> +	SR_HANDLER(PMCCFILTR_EL0),
> +	SR_HANDLER(PMCNTENSET_EL0),
> +	SR_HANDLER(PMINTENSET_EL1),
> +	SR_HANDLER(PMOVSSET_EL0),
> +	SR_HANDLER(PMSWINC_EL0),
> +	SR_HANDLER(PMUSERENR_EL0),
> +	SR_HANDLER(DACR32_EL2),
> +	SR_HANDLER(IFSR32_EL2),
> +	SR_HANDLER(FPEXC32_EL2),
> +	SR_HANDLER(DBGVCR32_EL2),
> +};
> +
> +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> +{
> +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> +}
> +
> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> +{
> +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> +}
> +
>  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
>  static u32 cache_levels;
>  
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-24 16:13     ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-24 16:13 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Jan 12, 2018 at 01:07:20PM +0100, Christoffer Dall wrote:
> So far this is just a copy of the legacy non-VHE switch function, but we
> will start reworking these functions in separate directions to work on
> VHE and non-VHE in the most optimal way in later patches.

I'd be concerned that now that these are separate, they will accumulate
pointless forkage over time and become hard to maintain.  Yet they are
supposed to do the same thing to within abstractable details.

To a first approximation, the _vhe() case is the same as _nhve() except
for the omission of those things that are deferred to load()/put().
Doubtless I'm glossing over some tricky details though.

Do you expect more fundamental divergence in the future?

Cheers
---Dave

> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_asm.h   |  5 +++-
>  arch/arm/kvm/hyp/switch.c        |  2 +-
>  arch/arm64/include/asm/kvm_asm.h |  4 ++-
>  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
>  virt/kvm/arm/arm.c               |  5 +++-
>  5 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..4ac717276543 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +/* no VHE on 32-bit :( */
> +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>  
>  extern void __init_stage2_translation(void);
>  
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index c3b9799e2e13..7b2bd25e3b10 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 6c7599b5cb40..fb91e728207b 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>  
>  extern u64 __vgic_v3_get_ich_vtr_el2(void);
>  extern u64 __vgic_v3_read_vmcr(void);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 55ca2e3d42eb..accfe9a016f9 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	return false;
>  }
>  
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +/* Switch to the guest for VHE systems running in EL2 */
> +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *host_ctxt;
> +	struct kvm_cpu_context *guest_ctxt;
> +	u64 exit_code;
> +
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt->__hyp_running_vcpu = vcpu;
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__sysreg_save_host_state(host_ctxt);
> +
> +	__activate_traps(vcpu);
> +	__activate_vm(vcpu);
> +
> +	__vgic_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
> +
> +	/*
> +	 * We must restore the 32-bit state before the sysregs, thanks
> +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +	 */
> +	__sysreg32_restore_state(vcpu);
> +	__sysreg_restore_guest_state(guest_ctxt);
> +	__debug_switch_to_guest(vcpu);
> +
> +	do {
> +		/* Jump in the fire! */
> +		exit_code = __guest_enter(vcpu, host_ctxt);
> +
> +		/* And we're baaack! */
> +	} while (fixup_guest_exit(vcpu, &exit_code));
> +
> +	__sysreg_save_guest_state(guest_ctxt);
> +	__sysreg32_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +	__vgic_save_state(vcpu);
> +
> +	__deactivate_traps(vcpu);
> +	__deactivate_vm(vcpu);
> +
> +	__sysreg_restore_host_state(host_ctxt);
> +
> +	/*
> +	 * This must come after restoring the host sysregs, since a non-VHE
> +	 * system may enable SPE here and make use of the TTBRs.
> +	 */
> +	__debug_switch_to_host(vcpu);
> +
> +	return exit_code;
> +}
> +
> +/* Switch to the guest for legacy non-VHE systems */
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5b1487bd91e8..6bce8f9c55db 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		trace_kvm_entry(*vcpu_pc(vcpu));
>  		guest_enter_irqoff();
>  
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		if (has_vhe())
> +			ret = kvm_vcpu_run_vhe(vcpu);
> +		else
> +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->stat.exits++;
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
@ 2018-01-24 16:13     ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-24 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 01:07:20PM +0100, Christoffer Dall wrote:
> So far this is just a copy of the legacy non-VHE switch function, but we
> will start reworking these functions in separate directions to work on
> VHE and non-VHE in the most optimal way in later patches.

I'd be concerned that now that these are separate, they will accumulate
pointless forkage over time and become hard to maintain.  Yet they are
supposed to do the same thing to within abstractable details.

To a first approximation, the _vhe() case is the same as _nhve() except
for the omission of those things that are deferred to load()/put().
Doubtless I'm glossing over some tricky details though.

Do you expect more fundamental divergence in the future?

Cheers
---Dave

> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_asm.h   |  5 +++-
>  arch/arm/kvm/hyp/switch.c        |  2 +-
>  arch/arm64/include/asm/kvm_asm.h |  4 ++-
>  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
>  virt/kvm/arm/arm.c               |  5 +++-
>  5 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..4ac717276543 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +/* no VHE on 32-bit :( */
> +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>  
>  extern void __init_stage2_translation(void);
>  
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index c3b9799e2e13..7b2bd25e3b10 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 6c7599b5cb40..fb91e728207b 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>  
>  extern u64 __vgic_v3_get_ich_vtr_el2(void);
>  extern u64 __vgic_v3_read_vmcr(void);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 55ca2e3d42eb..accfe9a016f9 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	return false;
>  }
>  
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +/* Switch to the guest for VHE systems running in EL2 */
> +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *host_ctxt;
> +	struct kvm_cpu_context *guest_ctxt;
> +	u64 exit_code;
> +
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt->__hyp_running_vcpu = vcpu;
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__sysreg_save_host_state(host_ctxt);
> +
> +	__activate_traps(vcpu);
> +	__activate_vm(vcpu);
> +
> +	__vgic_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
> +
> +	/*
> +	 * We must restore the 32-bit state before the sysregs, thanks
> +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +	 */
> +	__sysreg32_restore_state(vcpu);
> +	__sysreg_restore_guest_state(guest_ctxt);
> +	__debug_switch_to_guest(vcpu);
> +
> +	do {
> +		/* Jump in the fire! */
> +		exit_code = __guest_enter(vcpu, host_ctxt);
> +
> +		/* And we're baaack! */
> +	} while (fixup_guest_exit(vcpu, &exit_code));
> +
> +	__sysreg_save_guest_state(guest_ctxt);
> +	__sysreg32_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +	__vgic_save_state(vcpu);
> +
> +	__deactivate_traps(vcpu);
> +	__deactivate_vm(vcpu);
> +
> +	__sysreg_restore_host_state(host_ctxt);
> +
> +	/*
> +	 * This must come after restoring the host sysregs, since a non-VHE
> +	 * system may enable SPE here and make use of the TTBRs.
> +	 */
> +	__debug_switch_to_host(vcpu);
> +
> +	return exit_code;
> +}
> +
> +/* Switch to the guest for legacy non-VHE systems */
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5b1487bd91e8..6bce8f9c55db 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		trace_kvm_entry(*vcpu_pc(vcpu));
>  		guest_enter_irqoff();
>  
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		if (has_vhe())
> +			ret = kvm_vcpu_run_vhe(vcpu);
> +		else
> +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		vcpu->stat.exits++;
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-24 16:24     ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-24 16:24 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Jan 12, 2018 at 01:07:21PM +0100, Christoffer Dall wrote:
> VHE kernels run completely in EL2 and therefore don't have a notion of
> kernel and hyp addresses, they are all just kernel addresses.  Therefore
> don't call kern_hyp_va() in the VHE switch function.

Isn't this an example of avoidable forkage?

This looks like it's probably just saving a couple of nops, though I may
have misunderstood how this interfacts with alternatives.

Cheers
---Dave

> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index accfe9a016f9..05fba76ec918 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	struct kvm_cpu_context *guest_ctxt;
>  	u64 exit_code;
>  
> -	vcpu = kern_hyp_va(vcpu);
> -
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt = vcpu->arch.host_cpu_context;
>  	host_ctxt->__hyp_running_vcpu = vcpu;
>  	guest_ctxt = &vcpu->arch.ctxt;
>  
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
@ 2018-01-24 16:24     ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-01-24 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 01:07:21PM +0100, Christoffer Dall wrote:
> VHE kernels run completely in EL2 and therefore don't have a notion of
> kernel and hyp addresses, they are all just kernel addresses.  Therefore
> don't call kern_hyp_va() in the VHE switch function.

Isn't this an example of avoidable forkage?

This looks like it's probably just saving a couple of nops, though I may
have misunderstood how this interfacts with alternatives.

Cheers
---Dave

> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index accfe9a016f9..05fba76ec918 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	struct kvm_cpu_context *guest_ctxt;
>  	u64 exit_code;
>  
> -	vcpu = kern_hyp_va(vcpu);
> -
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt = vcpu->arch.host_cpu_context;
>  	host_ctxt->__hyp_running_vcpu = vcpu;
>  	guest_ctxt = &vcpu->arch.ctxt;
>  
> -- 
> 2.14.2
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  2018-01-24 16:13     ` Dave Martin
@ 2018-01-25  8:45       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25  8:45 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Wed, Jan 24, 2018 at 04:13:26PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:20PM +0100, Christoffer Dall wrote:
> > So far this is just a copy of the legacy non-VHE switch function, but we
> > will start reworking these functions in separate directions to work on
> > VHE and non-VHE in the most optimal way in later patches.
> 
> I'd be concerned that now that these are separate, they will accumulate
> pointless forkage over time and become hard to maintain.  Yet they are
> supposed to do the same thing to within abstractable details.

I actually think they are conceptually quite different on a VHE vs.
non-VHE system.  On a non-VHE system, you really are talking about a
world-switch, because you're switching the entire world as seen from the
kernel mode (EL1) and up.  On a VHE-system, you're just running
something in a less privileged CPU mode, like a special user space, and
not really changing the world; part of the world as you know it (EL2,
your own mode) remains unaffected.

> 
> To a first approximation, the _vhe() case is the same as _nhve() except
> for the omission of those things that are deferred to load()/put().
> Doubtless I'm glossing over some tricky details though.
> 
> Do you expect more fundamental divergence in the future?
> 

So this is just the way I tried to lay out the patches to make them easy
to follow and easy to bisect.  I may have failed somewhat for the
former, or at least I should have explained this more clearly in the
commit message.

If you look at switch.c after patch 35 you should observe that the VHE
version does significantly less work (very little work, in fact) than
the non-VHE function and is essentially a wrapper around the low-level
exception return code, with as much logic shared between the two
functions as possible.

Note that in my initial optimization attempts I did not pursue the
separate kvm_vcpu_run functions, but I wasn't able to achieve the same
level of performance improvements as I am with separate functions, and
my attempts to pin-point and measure that by instrumenting the code with
cycle counts along the paths only indicated that it was the cumulative
effect of having multiple conditionals and unnecessary function calls
that cause significant differences between the two approaches.

Let me know what you think.

Thanks,
-Christoffer

> 
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_asm.h   |  5 +++-
> >  arch/arm/kvm/hyp/switch.c        |  2 +-
> >  arch/arm64/include/asm/kvm_asm.h |  4 ++-
> >  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
> >  virt/kvm/arm/arm.c               |  5 +++-
> >  5 files changed, 69 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> > index 36dd2962a42d..4ac717276543 100644
> > --- a/arch/arm/include/asm/kvm_asm.h
> > +++ b/arch/arm/include/asm/kvm_asm.h
> > @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >  
> > -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> > +/* no VHE on 32-bit :( */
> > +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> > +
> > +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
> >  
> >  extern void __init_stage2_translation(void);
> >  
> > diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> > index c3b9799e2e13..7b2bd25e3b10 100644
> > --- a/arch/arm/kvm/hyp/switch.c
> > +++ b/arch/arm/kvm/hyp/switch.c
> > @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
> >  	return true;
> >  }
> >  
> > -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> > +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_cpu_context *host_ctxt;
> >  	struct kvm_cpu_context *guest_ctxt;
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 6c7599b5cb40..fb91e728207b 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >  
> > -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> > +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> > +
> > +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
> >  
> >  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> >  extern u64 __vgic_v3_read_vmcr(void);
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 55ca2e3d42eb..accfe9a016f9 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	return false;
> >  }
> >  
> > -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> > +/* Switch to the guest for VHE systems running in EL2 */
> > +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_cpu_context *host_ctxt;
> > +	struct kvm_cpu_context *guest_ctxt;
> > +	u64 exit_code;
> > +
> > +	vcpu = kern_hyp_va(vcpu);
> > +
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	host_ctxt->__hyp_running_vcpu = vcpu;
> > +	guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	__sysreg_save_host_state(host_ctxt);
> > +
> > +	__activate_traps(vcpu);
> > +	__activate_vm(vcpu);
> > +
> > +	__vgic_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> > +
> > +	/*
> > +	 * We must restore the 32-bit state before the sysregs, thanks
> > +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> > +	 */
> > +	__sysreg32_restore_state(vcpu);
> > +	__sysreg_restore_guest_state(guest_ctxt);
> > +	__debug_switch_to_guest(vcpu);
> > +
> > +	do {
> > +		/* Jump in the fire! */
> > +		exit_code = __guest_enter(vcpu, host_ctxt);
> > +
> > +		/* And we're baaack! */
> > +	} while (fixup_guest_exit(vcpu, &exit_code));
> > +
> > +	__sysreg_save_guest_state(guest_ctxt);
> > +	__sysreg32_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> > +	__vgic_save_state(vcpu);
> > +
> > +	__deactivate_traps(vcpu);
> > +	__deactivate_vm(vcpu);
> > +
> > +	__sysreg_restore_host_state(host_ctxt);
> > +
> > +	/*
> > +	 * This must come after restoring the host sysregs, since a non-VHE
> > +	 * system may enable SPE here and make use of the TTBRs.
> > +	 */
> > +	__debug_switch_to_host(vcpu);
> > +
> > +	return exit_code;
> > +}
> > +
> > +/* Switch to the guest for legacy non-VHE systems */
> > +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_cpu_context *host_ctxt;
> >  	struct kvm_cpu_context *guest_ctxt;
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 5b1487bd91e8..6bce8f9c55db 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		trace_kvm_entry(*vcpu_pc(vcpu));
> >  		guest_enter_irqoff();
> >  
> > -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > +		if (has_vhe())
> > +			ret = kvm_vcpu_run_vhe(vcpu);
> > +		else
> > +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >  		vcpu->stat.exits++;
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
@ 2018-01-25  8:45       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 24, 2018 at 04:13:26PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:20PM +0100, Christoffer Dall wrote:
> > So far this is just a copy of the legacy non-VHE switch function, but we
> > will start reworking these functions in separate directions to work on
> > VHE and non-VHE in the most optimal way in later patches.
> 
> I'd be concerned that now that these are separate, they will accumulate
> pointless forkage over time and become hard to maintain.  Yet they are
> supposed to do the same thing to within abstractable details.

I actually think they are conceptually quite different on a VHE vs.
non-VHE system.  On a non-VHE system, you really are talking about a
world-switch, because you're switching the entire world as seen from the
kernel mode (EL1) and up.  On a VHE-system, you're just running
something in a less privileged CPU mode, like a special user space, and
not really changing the world; part of the world as you know it (EL2,
your own mode) remains unaffected.

> 
> To a first approximation, the _vhe() case is the same as _nhve() except
> for the omission of those things that are deferred to load()/put().
> Doubtless I'm glossing over some tricky details though.
> 
> Do you expect more fundamental divergence in the future?
> 

So this is just the way I tried to lay out the patches to make them easy
to follow and easy to bisect.  I may have failed somewhat for the
former, or at least I should have explained this more clearly in the
commit message.

If you look at switch.c after patch 35 you should observe that the VHE
version does significantly less work (very little work, in fact) than
the non-VHE function and is essentially a wrapper around the low-level
exception return code, with as much logic shared between the two
functions as possible.

Note that in my initial optimization attempts I did not pursue the
separate kvm_vcpu_run functions, but I wasn't able to achieve the same
level of performance improvements as I am with separate functions, and
my attempts to pin-point and measure that by instrumenting the code with
cycle counts along the paths only indicated that it was the cumulative
effect of having multiple conditionals and unnecessary function calls
that cause significant differences between the two approaches.

Let me know what you think.

Thanks,
-Christoffer

> 
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_asm.h   |  5 +++-
> >  arch/arm/kvm/hyp/switch.c        |  2 +-
> >  arch/arm64/include/asm/kvm_asm.h |  4 ++-
> >  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
> >  virt/kvm/arm/arm.c               |  5 +++-
> >  5 files changed, 69 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> > index 36dd2962a42d..4ac717276543 100644
> > --- a/arch/arm/include/asm/kvm_asm.h
> > +++ b/arch/arm/include/asm/kvm_asm.h
> > @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >  
> > -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> > +/* no VHE on 32-bit :( */
> > +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> > +
> > +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
> >  
> >  extern void __init_stage2_translation(void);
> >  
> > diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> > index c3b9799e2e13..7b2bd25e3b10 100644
> > --- a/arch/arm/kvm/hyp/switch.c
> > +++ b/arch/arm/kvm/hyp/switch.c
> > @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
> >  	return true;
> >  }
> >  
> > -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> > +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_cpu_context *host_ctxt;
> >  	struct kvm_cpu_context *guest_ctxt;
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 6c7599b5cb40..fb91e728207b 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >  
> > -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> > +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> > +
> > +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
> >  
> >  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> >  extern u64 __vgic_v3_read_vmcr(void);
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 55ca2e3d42eb..accfe9a016f9 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  	return false;
> >  }
> >  
> > -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> > +/* Switch to the guest for VHE systems running in EL2 */
> > +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_cpu_context *host_ctxt;
> > +	struct kvm_cpu_context *guest_ctxt;
> > +	u64 exit_code;
> > +
> > +	vcpu = kern_hyp_va(vcpu);
> > +
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	host_ctxt->__hyp_running_vcpu = vcpu;
> > +	guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	__sysreg_save_host_state(host_ctxt);
> > +
> > +	__activate_traps(vcpu);
> > +	__activate_vm(vcpu);
> > +
> > +	__vgic_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> > +
> > +	/*
> > +	 * We must restore the 32-bit state before the sysregs, thanks
> > +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> > +	 */
> > +	__sysreg32_restore_state(vcpu);
> > +	__sysreg_restore_guest_state(guest_ctxt);
> > +	__debug_switch_to_guest(vcpu);
> > +
> > +	do {
> > +		/* Jump in the fire! */
> > +		exit_code = __guest_enter(vcpu, host_ctxt);
> > +
> > +		/* And we're baaack! */
> > +	} while (fixup_guest_exit(vcpu, &exit_code));
> > +
> > +	__sysreg_save_guest_state(guest_ctxt);
> > +	__sysreg32_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> > +	__vgic_save_state(vcpu);
> > +
> > +	__deactivate_traps(vcpu);
> > +	__deactivate_vm(vcpu);
> > +
> > +	__sysreg_restore_host_state(host_ctxt);
> > +
> > +	/*
> > +	 * This must come after restoring the host sysregs, since a non-VHE
> > +	 * system may enable SPE here and make use of the TTBRs.
> > +	 */
> > +	__debug_switch_to_host(vcpu);
> > +
> > +	return exit_code;
> > +}
> > +
> > +/* Switch to the guest for legacy non-VHE systems */
> > +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_cpu_context *host_ctxt;
> >  	struct kvm_cpu_context *guest_ctxt;
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 5b1487bd91e8..6bce8f9c55db 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		trace_kvm_entry(*vcpu_pc(vcpu));
> >  		guest_enter_irqoff();
> >  
> > -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > +		if (has_vhe())
> > +			ret = kvm_vcpu_run_vhe(vcpu);
> > +		else
> > +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
> >  
> >  		vcpu->mode = OUTSIDE_GUEST_MODE;
> >  		vcpu->stat.exits++;
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm at lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-01-22 17:33     ` Dave Martin
@ 2018-01-25 19:46       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:46 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > Avoid saving the guest VFP registers and restoring the host VFP
> > registers on every exit from the VM.  Only when we're about to run
> > userspace or other threads in the kernel do we really have to switch the
> > state back to the host state.
> > 
> > We still initially configure the VFP registers to trap when entering the
> > VM, but the difference is that we now leave the guest state in the
> > hardware registers as long as we're running this VCPU, even if we
> > occasionally trap to the host, and we only restore the host state when
> > we return to user space or when scheduling another thread.
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> [...]
> 
> > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > index 883a6383cd36..848a46eb33bf 100644
> > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> 
> [...]
> 
> > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> >  {
> > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	/* Restore host FP/SIMD state */
> > +	if (vcpu->arch.guest_vfp_loaded) {
> > +		if (vcpu_el1_is_32bit(vcpu)) {
> > +			kvm_call_hyp(__fpsimd32_save_state,
> > +				     kern_hyp_va(guest_ctxt));
> > +		}
> > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > +		vcpu->arch.guest_vfp_loaded = 0;
> 
> Provided we've already marked the host FPSIMD state as dirty on the way
> in, we probably don't need to restore it here.
> 
> In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> it's only done for SVE, since KVM was previously restoring the host
> FPSIMD subset of the state anyway, but it could be made unconditional.
> 
> For a returning run ioctl, this would have the effect of deferring the
> host FPSIMD reload until we return to userspace, which is probably
> no more costly since the kernel must check whether to do this in
> ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> other thread we save the cost of restoring the host state entirely here
> ... I think.

Yes, I agree.  However, currently the low-level logic in
arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
host_cpu_context is a KVM-specific per-cpu variable).  I think means
that simply marking the state as invalid would cause the kernel to
restore some potentially stale values when returning to userspace.  Am I
missing something?

It might very well be possible to change the logic so that we store the
host logic the same place where task_fpsimd_save() would have, and I
think that would make what you suggest possible.

I'd like to make that a separate change from this patch though, as we're
already changing quite a bit with this series, so I'm trying to make any
logical change as contained per patch as possible, so that problems can
be spotted by bisecting.

> 
> Ultimately I'd like to go one better and actually treat a vcpu as a
> first-class fpsimd context, so that taking an interrupt to the host
> and then reentering the guest doesn't cause any reload at all.  

That should be the case already; kvm_vcpu_put_sysregs() is only called
when you run another thread (preemptively or voluntarily), or when you
return to user space, but making the vcpu fpsimd context a first-class
citizen fpsimd context would mean that you can run another thread (and
maybe run userspace if it doesn't use fpsimd?) without having to
save/restore anything.  Am I getting this right?

> But
> that feels like too big a step for this series, and there are likely
> side-issues I've not thought about yet.
> 

It should definitely be in separate patches, but I would be optn to
tagging something on to the end of this series if we can stabilize this
series early after -rc1 is out.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-01-25 19:46       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > Avoid saving the guest VFP registers and restoring the host VFP
> > registers on every exit from the VM.  Only when we're about to run
> > userspace or other threads in the kernel do we really have to switch the
> > state back to the host state.
> > 
> > We still initially configure the VFP registers to trap when entering the
> > VM, but the difference is that we now leave the guest state in the
> > hardware registers as long as we're running this VCPU, even if we
> > occasionally trap to the host, and we only restore the host state when
> > we return to user space or when scheduling another thread.
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> [...]
> 
> > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > index 883a6383cd36..848a46eb33bf 100644
> > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> 
> [...]
> 
> > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> >  {
> > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	/* Restore host FP/SIMD state */
> > +	if (vcpu->arch.guest_vfp_loaded) {
> > +		if (vcpu_el1_is_32bit(vcpu)) {
> > +			kvm_call_hyp(__fpsimd32_save_state,
> > +				     kern_hyp_va(guest_ctxt));
> > +		}
> > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > +		vcpu->arch.guest_vfp_loaded = 0;
> 
> Provided we've already marked the host FPSIMD state as dirty on the way
> in, we probably don't need to restore it here.
> 
> In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> it's only done for SVE, since KVM was previously restoring the host
> FPSIMD subset of the state anyway, but it could be made unconditional.
> 
> For a returning run ioctl, this would have the effect of deferring the
> host FPSIMD reload until we return to userspace, which is probably
> no more costly since the kernel must check whether to do this in
> ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> other thread we save the cost of restoring the host state entirely here
> ... I think.

Yes, I agree.  However, currently the low-level logic in
arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
host_cpu_context is a KVM-specific per-cpu variable).  I think means
that simply marking the state as invalid would cause the kernel to
restore some potentially stale values when returning to userspace.  Am I
missing something?

It might very well be possible to change the logic so that we store the
host logic the same place where task_fpsimd_save() would have, and I
think that would make what you suggest possible.

I'd like to make that a separate change from this patch though, as we're
already changing quite a bit with this series, so I'm trying to make any
logical change as contained per patch as possible, so that problems can
be spotted by bisecting.

> 
> Ultimately I'd like to go one better and actually treat a vcpu as a
> first-class fpsimd context, so that taking an interrupt to the host
> and then reentering the guest doesn't cause any reload at all.  

That should be the case already; kvm_vcpu_put_sysregs() is only called
when you run another thread (preemptively or voluntarily), or when you
return to user space, but making the vcpu fpsimd context a first-class
citizen fpsimd context would mean that you can run another thread (and
maybe run userspace if it doesn't use fpsimd?) without having to
save/restore anything.  Am I getting this right?

> But
> that feels like too big a step for this series, and there are likely
> side-issues I've not thought about yet.
> 

It should definitely be in separate patches, but I would be optn to
tagging something on to the end of this series if we can stabilize this
series early after -rc1 is out.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
  2018-01-24 16:24     ` Dave Martin
@ 2018-01-25 19:48       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:48 UTC (permalink / raw)
  To: Dave Martin; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Wed, Jan 24, 2018 at 04:24:15PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:21PM +0100, Christoffer Dall wrote:
> > VHE kernels run completely in EL2 and therefore don't have a notion of
> > kernel and hyp addresses, they are all just kernel addresses.  Therefore
> > don't call kern_hyp_va() in the VHE switch function.
> 
> Isn't this an example of avoidable forkage?
> 
> This looks like it's probably just saving a couple of nops, though I may
> have misunderstood how this interfacts with alternatives.

In isolation, and if we stopped here, you're absolutely right, it
doesn't make sense.  But this is just a step on the way to significantly
reduce the _vhe version.  Have a look at the following patches and the
end result and let me know if you still have concerns.

Thanks,
-Christoffer

> 
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index accfe9a016f9..05fba76ec918 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	struct kvm_cpu_context *guest_ctxt;
> >  	u64 exit_code;
> >  
> > -	vcpu = kern_hyp_va(vcpu);
> > -
> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	host_ctxt = vcpu->arch.host_cpu_context;
> >  	host_ctxt->__hyp_running_vcpu = vcpu;
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function
@ 2018-01-25 19:48       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 24, 2018 at 04:24:15PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:21PM +0100, Christoffer Dall wrote:
> > VHE kernels run completely in EL2 and therefore don't have a notion of
> > kernel and hyp addresses, they are all just kernel addresses.  Therefore
> > don't call kern_hyp_va() in the VHE switch function.
> 
> Isn't this an example of avoidable forkage?
> 
> This looks like it's probably just saving a couple of nops, though I may
> have misunderstood how this interfacts with alternatives.

In isolation, and if we stopped here, you're absolutely right, it
doesn't make sense.  But this is just a step on the way to significantly
reduce the _vhe version.  Have a look at the following patches and the
end result and let me know if you still have concerns.

Thanks,
-Christoffer

> 
> > 
> > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index accfe9a016f9..05fba76ec918 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -345,9 +345,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	struct kvm_cpu_context *guest_ctxt;
> >  	u64 exit_code;
> >  
> > -	vcpu = kern_hyp_va(vcpu);
> > -
> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	host_ctxt = vcpu->arch.host_cpu_context;
> >  	host_ctxt->__hyp_running_vcpu = vcpu;
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm at lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-23 16:04     ` Dave Martin
@ 2018-01-25 19:54       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:54 UTC (permalink / raw)
  To: Dave Martin; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > We are about to defer saving and restoring some groups of system
> > registers to vcpu_put and vcpu_load on supported systems.  This means
> > that we need some infrastructure to access system registes which
> > supports either accessing the memory backing of the register or directly
> > accessing the system registers, depending on the state of the system
> > when we access the register.
> > 
> > We do this by defining a set of read/write accessors for each system
> > register, and letting each system register be defined as "immediate" or
> > "deferrable".  Immediate registers are always saved/restored in the
> > world-switch path, but deferrable registers are only saved/restored in
> > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > in that case.
> > 
> > Not that we don't use the deferred mechanism yet in this patch, but only
> > introduce infrastructure.  This is to improve convenience of review in
> > the subsequent patches where it is clear which registers become
> > deferred.
> 
> Might this table-driven approach result in a lot of branch mispredicts,
> particularly across load/put boundaries?
> 
> If we were to move the whole construct to a header, then it could get
> constant-folded at the call site down to the individual reg accessed,
> say:
> 
> 	if (sys_regs_loaded)
> 		read_sysreg_s(TPIDR_EL0);
> 	else
> 		__vcpu_sys_reg(v, TPIDR_EL0);
> 
> Where multiple regs are accessed close to each other, the compiler
> may be able to specialise the whole sequence for the loaded and !loaded
> cases so that there is only one conditional branch.
> 

That's an interesting thing to consider indeed.  I wasn't really sure
how to put this in a header file which wouldn't look overly bloated for
inclusion elsewhere, so we ended up with this.

I don't think the alternative suggestion that I discused with Julien on
this patch changes this much, but since you've had a look at this, I'm
curious which one of the two (lookup table vs. giant switch) you prefer?

> 
> The individual accessor functions also become unnecessary in this case,
> because we wouldn't need to derive function pointers from them any
> more.
> 
> I don't know how performance would compare in practice though.

I don't know either.  But I will say that the whole idea behind put/load
is that you do this rarely, and going to userspace from KVM is
notriously expensive, also on x86.

> 
> I'm also assuming that all calls to these accessors are const-foldable.
> If not, relying on inlining would bloat the generated code a lot.

We have places where this is not the cae, access_vm_reg() for example.
But if we really, really, wanted to, we could rewrite that to have a
function for each register, but that's pretty horrid on its own.

Thanks,
-Christoffer

> 
> > 
> >  [ Most of this logic was contributed by Marc Zyngier ]
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |   8 +-
> >  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 166 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 91272c35cc36..4b5ef82f6bdb 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
> >  
> >  	/* Detect first run of a vcpu */
> >  	bool has_run_once;
> > +
> > +	/* True when deferrable sysregs are loaded on the physical CPU,
> > +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> > +	bool sysregs_loaded_on_cpu;
> >  };
> >  
> >  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> > @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
> >   */
> >  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
> >  
> > -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> > -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> > +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> > +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
> >  
> >  /*
> >   * CP14 and CP15 live in the same array, as they are backed by the
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 96398d53b462..9d353a6a55c9 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -35,6 +35,7 @@
> >  #include <asm/kvm_coproc.h>
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_host.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/perf_event.h>
> >  #include <asm/sysreg.h>
> > @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
> >  	return false;
> >  }
> >  
> > +struct sys_reg_accessor {
> > +	u64	(*rdsr)(struct kvm_vcpu *, int);
> > +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> > +};
> > +
> > +#define DECLARE_IMMEDIATE_SR(i)						\
> > +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> > +	{								\
> > +		return __vcpu_sys_reg(vcpu, r);				\
> > +	}								\
> > +									\
> > +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> > +	{								\
> > +		__vcpu_sys_reg(vcpu, r) = v;				\
> > +	}								\
> > +
> > +#define DECLARE_DEFERRABLE_SR(i, s)					\
> > +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> > +	{								\
> > +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> > +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> > +			return read_sysreg_s((s));			\
> > +		}							\
> > +		return __vcpu_sys_reg(vcpu, r);				\
> > +	}								\
> > +									\
> > +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> > +	{								\
> > +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> > +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> > +			write_sysreg_s(v, (s));				\
> > +		} else {						\
> > +			__vcpu_sys_reg(vcpu, r) = v;			\
> > +		}							\
> > +	}								\
> > +
> > +
> > +#define SR_HANDLER_RANGE(i,e)						\
> > +	[i ... e] =  (struct sys_reg_accessor) {			\
> > +		.rdsr = __##i##_read,					\
> > +		.wrsr = __##i##_write,					\
> > +	}
> > +
> > +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> > +
> > +static void bad_sys_reg(int reg)
> > +{
> > +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> > +}
> > +
> > +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> > +{
> > +	bad_sys_reg(reg);
> > +	return 0;
> > +}
> > +
> > +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> > +{
> > +	bad_sys_reg(reg);
> > +}
> > +
> > +/* Ordered as in enum vcpu_sysreg */
> > +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> > +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> > +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> > +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> > +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> > +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> > +DECLARE_IMMEDIATE_SR(TCR_EL1);
> > +DECLARE_IMMEDIATE_SR(ESR_EL1);
> > +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> > +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> > +DECLARE_IMMEDIATE_SR(FAR_EL1);
> > +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> > +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> > +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> > +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> > +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> > +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> > +DECLARE_IMMEDIATE_SR(PAR_EL1);
> > +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> > +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> > +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> > +/* PMEVCNTR30_EL0 */
> > +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> > +/* PMEVTYPER30_EL0 */
> > +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> > +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> > +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> > +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> > +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> > +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> > +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> > +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> > +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> > +
> > +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> > +	[0 ... NR_SYS_REGS - 1] = {
> > +		.rdsr = __default_read_sys_reg,
> > +		.wrsr = __default_write_sys_reg,
> > +	},
> > +
> > +	SR_HANDLER(MPIDR_EL1),
> > +	SR_HANDLER(CSSELR_EL1),
> > +	SR_HANDLER(SCTLR_EL1),
> > +	SR_HANDLER(ACTLR_EL1),
> > +	SR_HANDLER(CPACR_EL1),
> > +	SR_HANDLER(TTBR0_EL1),
> > +	SR_HANDLER(TTBR1_EL1),
> > +	SR_HANDLER(TCR_EL1),
> > +	SR_HANDLER(ESR_EL1),
> > +	SR_HANDLER(AFSR0_EL1),
> > +	SR_HANDLER(AFSR1_EL1),
> > +	SR_HANDLER(FAR_EL1),
> > +	SR_HANDLER(MAIR_EL1),
> > +	SR_HANDLER(VBAR_EL1),
> > +	SR_HANDLER(CONTEXTIDR_EL1),
> > +	SR_HANDLER(TPIDR_EL0),
> > +	SR_HANDLER(TPIDRRO_EL0),
> > +	SR_HANDLER(TPIDR_EL1),
> > +	SR_HANDLER(AMAIR_EL1),
> > +	SR_HANDLER(CNTKCTL_EL1),
> > +	SR_HANDLER(PAR_EL1),
> > +	SR_HANDLER(MDSCR_EL1),
> > +	SR_HANDLER(MDCCINT_EL1),
> > +	SR_HANDLER(PMCR_EL0),
> > +	SR_HANDLER(PMSELR_EL0),
> > +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> > +	SR_HANDLER(PMCCNTR_EL0),
> > +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> > +	SR_HANDLER(PMCCFILTR_EL0),
> > +	SR_HANDLER(PMCNTENSET_EL0),
> > +	SR_HANDLER(PMINTENSET_EL1),
> > +	SR_HANDLER(PMOVSSET_EL0),
> > +	SR_HANDLER(PMSWINC_EL0),
> > +	SR_HANDLER(PMUSERENR_EL0),
> > +	SR_HANDLER(DACR32_EL2),
> > +	SR_HANDLER(IFSR32_EL2),
> > +	SR_HANDLER(FPEXC32_EL2),
> > +	SR_HANDLER(DBGVCR32_EL2),
> > +};
> > +
> > +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> > +{
> > +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> > +}
> > +
> > +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> > +{
> > +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> > +}
> > +
> >  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
> >  static u32 cache_levels;
> >  
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-01-25 19:54       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-01-25 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > We are about to defer saving and restoring some groups of system
> > registers to vcpu_put and vcpu_load on supported systems.  This means
> > that we need some infrastructure to access system registes which
> > supports either accessing the memory backing of the register or directly
> > accessing the system registers, depending on the state of the system
> > when we access the register.
> > 
> > We do this by defining a set of read/write accessors for each system
> > register, and letting each system register be defined as "immediate" or
> > "deferrable".  Immediate registers are always saved/restored in the
> > world-switch path, but deferrable registers are only saved/restored in
> > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > in that case.
> > 
> > Not that we don't use the deferred mechanism yet in this patch, but only
> > introduce infrastructure.  This is to improve convenience of review in
> > the subsequent patches where it is clear which registers become
> > deferred.
> 
> Might this table-driven approach result in a lot of branch mispredicts,
> particularly across load/put boundaries?
> 
> If we were to move the whole construct to a header, then it could get
> constant-folded at the call site down to the individual reg accessed,
> say:
> 
> 	if (sys_regs_loaded)
> 		read_sysreg_s(TPIDR_EL0);
> 	else
> 		__vcpu_sys_reg(v, TPIDR_EL0);
> 
> Where multiple regs are accessed close to each other, the compiler
> may be able to specialise the whole sequence for the loaded and !loaded
> cases so that there is only one conditional branch.
> 

That's an interesting thing to consider indeed.  I wasn't really sure
how to put this in a header file which wouldn't look overly bloated for
inclusion elsewhere, so we ended up with this.

I don't think the alternative suggestion that I discused with Julien on
this patch changes this much, but since you've had a look at this, I'm
curious which one of the two (lookup table vs. giant switch) you prefer?

> 
> The individual accessor functions also become unnecessary in this case,
> because we wouldn't need to derive function pointers from them any
> more.
> 
> I don't know how performance would compare in practice though.

I don't know either.  But I will say that the whole idea behind put/load
is that you do this rarely, and going to userspace from KVM is
notriously expensive, also on x86.

> 
> I'm also assuming that all calls to these accessors are const-foldable.
> If not, relying on inlining would bloat the generated code a lot.

We have places where this is not the cae, access_vm_reg() for example.
But if we really, really, wanted to, we could rewrite that to have a
function for each register, but that's pretty horrid on its own.

Thanks,
-Christoffer

> 
> > 
> >  [ Most of this logic was contributed by Marc Zyngier ]
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |   8 +-
> >  arch/arm64/kvm/sys_regs.c         | 160 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 166 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 91272c35cc36..4b5ef82f6bdb 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -281,6 +281,10 @@ struct kvm_vcpu_arch {
> >  
> >  	/* Detect first run of a vcpu */
> >  	bool has_run_once;
> > +
> > +	/* True when deferrable sysregs are loaded on the physical CPU,
> > +	 * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */
> > +	bool sysregs_loaded_on_cpu;
> >  };
> >  
> >  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
> > @@ -293,8 +297,8 @@ struct kvm_vcpu_arch {
> >   */
> >  #define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
> >  
> > -#define vcpu_read_sys_reg(v,r)	__vcpu_sys_reg(v,r)
> > -#define vcpu_write_sys_reg(v,r,n)	do { __vcpu_sys_reg(v,r) = n; } while (0)
> > +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg);
> > +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val);
> >  
> >  /*
> >   * CP14 and CP15 live in the same array, as they are backed by the
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 96398d53b462..9d353a6a55c9 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -35,6 +35,7 @@
> >  #include <asm/kvm_coproc.h>
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_host.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/perf_event.h>
> >  #include <asm/sysreg.h>
> > @@ -76,6 +77,165 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
> >  	return false;
> >  }
> >  
> > +struct sys_reg_accessor {
> > +	u64	(*rdsr)(struct kvm_vcpu *, int);
> > +	void	(*wrsr)(struct kvm_vcpu *, int, u64);
> > +};
> > +
> > +#define DECLARE_IMMEDIATE_SR(i)						\
> > +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> > +	{								\
> > +		return __vcpu_sys_reg(vcpu, r);				\
> > +	}								\
> > +									\
> > +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> > +	{								\
> > +		__vcpu_sys_reg(vcpu, r) = v;				\
> > +	}								\
> > +
> > +#define DECLARE_DEFERRABLE_SR(i, s)					\
> > +	static u64 __##i##_read(struct kvm_vcpu *vcpu, int r)		\
> > +	{								\
> > +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> > +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> > +			return read_sysreg_s((s));			\
> > +		}							\
> > +		return __vcpu_sys_reg(vcpu, r);				\
> > +	}								\
> > +									\
> > +	static void __##i##_write(struct kvm_vcpu *vcpu, int r, u64 v)	\
> > +	{								\
> > +		if (vcpu->arch.sysregs_loaded_on_cpu) {			\
> > +			WARN_ON(kvm_arm_get_running_vcpu() != vcpu);	\
> > +			write_sysreg_s(v, (s));				\
> > +		} else {						\
> > +			__vcpu_sys_reg(vcpu, r) = v;			\
> > +		}							\
> > +	}								\
> > +
> > +
> > +#define SR_HANDLER_RANGE(i,e)						\
> > +	[i ... e] =  (struct sys_reg_accessor) {			\
> > +		.rdsr = __##i##_read,					\
> > +		.wrsr = __##i##_write,					\
> > +	}
> > +
> > +#define SR_HANDLER(i)	SR_HANDLER_RANGE(i, i)
> > +
> > +static void bad_sys_reg(int reg)
> > +{
> > +	WARN_ONCE(1, "Bad system register access %d\n", reg);
> > +}
> > +
> > +static u64 __default_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> > +{
> > +	bad_sys_reg(reg);
> > +	return 0;
> > +}
> > +
> > +static void __default_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> > +{
> > +	bad_sys_reg(reg);
> > +}
> > +
> > +/* Ordered as in enum vcpu_sysreg */
> > +DECLARE_IMMEDIATE_SR(MPIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(CSSELR_EL1);
> > +DECLARE_IMMEDIATE_SR(SCTLR_EL1);
> > +DECLARE_IMMEDIATE_SR(ACTLR_EL1);
> > +DECLARE_IMMEDIATE_SR(CPACR_EL1);
> > +DECLARE_IMMEDIATE_SR(TTBR0_EL1);
> > +DECLARE_IMMEDIATE_SR(TTBR1_EL1);
> > +DECLARE_IMMEDIATE_SR(TCR_EL1);
> > +DECLARE_IMMEDIATE_SR(ESR_EL1);
> > +DECLARE_IMMEDIATE_SR(AFSR0_EL1);
> > +DECLARE_IMMEDIATE_SR(AFSR1_EL1);
> > +DECLARE_IMMEDIATE_SR(FAR_EL1);
> > +DECLARE_IMMEDIATE_SR(MAIR_EL1);
> > +DECLARE_IMMEDIATE_SR(VBAR_EL1);
> > +DECLARE_IMMEDIATE_SR(CONTEXTIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(TPIDR_EL0);
> > +DECLARE_IMMEDIATE_SR(TPIDRRO_EL0);
> > +DECLARE_IMMEDIATE_SR(TPIDR_EL1);
> > +DECLARE_IMMEDIATE_SR(AMAIR_EL1);
> > +DECLARE_IMMEDIATE_SR(CNTKCTL_EL1);
> > +DECLARE_IMMEDIATE_SR(PAR_EL1);
> > +DECLARE_IMMEDIATE_SR(MDSCR_EL1);
> > +DECLARE_IMMEDIATE_SR(MDCCINT_EL1);
> > +DECLARE_IMMEDIATE_SR(PMCR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMSELR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMEVCNTR0_EL0);
> > +/* PMEVCNTR30_EL0 */
> > +DECLARE_IMMEDIATE_SR(PMCCNTR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMEVTYPER0_EL0);
> > +/* PMEVTYPER30_EL0 */
> > +DECLARE_IMMEDIATE_SR(PMCCFILTR_EL0);
> > +DECLARE_IMMEDIATE_SR(PMCNTENSET_EL0);
> > +DECLARE_IMMEDIATE_SR(PMINTENSET_EL1);
> > +DECLARE_IMMEDIATE_SR(PMOVSSET_EL0);
> > +DECLARE_IMMEDIATE_SR(PMSWINC_EL0);
> > +DECLARE_IMMEDIATE_SR(PMUSERENR_EL0);
> > +DECLARE_IMMEDIATE_SR(DACR32_EL2);
> > +DECLARE_IMMEDIATE_SR(IFSR32_EL2);
> > +DECLARE_IMMEDIATE_SR(FPEXC32_EL2);
> > +DECLARE_IMMEDIATE_SR(DBGVCR32_EL2);
> > +
> > +static const struct sys_reg_accessor sys_reg_accessors[NR_SYS_REGS] = {
> > +	[0 ... NR_SYS_REGS - 1] = {
> > +		.rdsr = __default_read_sys_reg,
> > +		.wrsr = __default_write_sys_reg,
> > +	},
> > +
> > +	SR_HANDLER(MPIDR_EL1),
> > +	SR_HANDLER(CSSELR_EL1),
> > +	SR_HANDLER(SCTLR_EL1),
> > +	SR_HANDLER(ACTLR_EL1),
> > +	SR_HANDLER(CPACR_EL1),
> > +	SR_HANDLER(TTBR0_EL1),
> > +	SR_HANDLER(TTBR1_EL1),
> > +	SR_HANDLER(TCR_EL1),
> > +	SR_HANDLER(ESR_EL1),
> > +	SR_HANDLER(AFSR0_EL1),
> > +	SR_HANDLER(AFSR1_EL1),
> > +	SR_HANDLER(FAR_EL1),
> > +	SR_HANDLER(MAIR_EL1),
> > +	SR_HANDLER(VBAR_EL1),
> > +	SR_HANDLER(CONTEXTIDR_EL1),
> > +	SR_HANDLER(TPIDR_EL0),
> > +	SR_HANDLER(TPIDRRO_EL0),
> > +	SR_HANDLER(TPIDR_EL1),
> > +	SR_HANDLER(AMAIR_EL1),
> > +	SR_HANDLER(CNTKCTL_EL1),
> > +	SR_HANDLER(PAR_EL1),
> > +	SR_HANDLER(MDSCR_EL1),
> > +	SR_HANDLER(MDCCINT_EL1),
> > +	SR_HANDLER(PMCR_EL0),
> > +	SR_HANDLER(PMSELR_EL0),
> > +	SR_HANDLER_RANGE(PMEVCNTR0_EL0, PMEVCNTR30_EL0),
> > +	SR_HANDLER(PMCCNTR_EL0),
> > +	SR_HANDLER_RANGE(PMEVTYPER0_EL0, PMEVTYPER30_EL0),
> > +	SR_HANDLER(PMCCFILTR_EL0),
> > +	SR_HANDLER(PMCNTENSET_EL0),
> > +	SR_HANDLER(PMINTENSET_EL1),
> > +	SR_HANDLER(PMOVSSET_EL0),
> > +	SR_HANDLER(PMSWINC_EL0),
> > +	SR_HANDLER(PMUSERENR_EL0),
> > +	SR_HANDLER(DACR32_EL2),
> > +	SR_HANDLER(IFSR32_EL2),
> > +	SR_HANDLER(FPEXC32_EL2),
> > +	SR_HANDLER(DBGVCR32_EL2),
> > +};
> > +
> > +u64 vcpu_read_sys_reg(struct kvm_vcpu *vcpu, int reg)
> > +{
> > +	return sys_reg_accessors[reg].rdsr(vcpu, reg);
> > +}
> > +
> > +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, int reg, u64 val)
> > +{
> > +	sys_reg_accessors[reg].wrsr(vcpu, reg, val);
> > +}
> > +
> >  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
> >  static u32 cache_levels;
> >  
> > -- 
> > 2.14.2
> > 
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm at lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-31 12:17     ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-31 12:17 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones

Hi Christoffer,

On 12.01.2018 13:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 3f54c55f77a1..ffd62e31f134 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>   void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>   bool __fpsimd_enabled(void);
>   
> +void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> +void __deactivate_traps_nvhe_put(void);
> +
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> +void deactivate_traps_vhe_put(void);
> +
>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>   void __noreturn __hyp_do_panic(unsigned long, ...);
>   
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c01bcfc3fb52..d14ab9650f81 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,22 +24,25 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
>   {
>   	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> -	 * it will cause an exception.
> +	 * We are about to trap all floating point register accesses to EL2,
> +	 * however, traps are only taken to EL2 if the operation would not
> +	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> +	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> +	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> +	 * any access to it will cause an exception.
>   	 */
>   	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
>   	    !vcpu->arch.guest_vfp_loaded) {
>   		write_sysreg(1 << 30, fpexc32_el2);
>   		isb();
>   	}
> +}
> +
> +static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
>   
>   	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> @@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
>   	write_sysreg(0, pmuserenr_el0);
>   }
>   
> -static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> @@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	else
>   		val &= ~CPACR_EL1_FPEN;
>   	write_sysreg(val, cpacr_el1);

Giving that you move this code to kvm_vcpu_load_sysregs() I am wondering 
if we have to deactivate FPEN trap here. IIUC, we call 
kvm_vcpu_load_sysregs()->activate_traps_vhe_load() and then 
kvm_vcpu_put_sysregs() by design. So vcpu->arch.guest_vfp_loaded should 
be always 0 here since it is zeroed in kvm_vcpu_put_sysregs(). The same 
for nvhe case below.

I might miss some scenario or future changes you are planning to do. Let 
me know your thoughts.

Thanks,
Tomasz

> +}
>   
> +void deactivate_traps_vhe_put(void)
> +{
> +	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(__kvm_hyp_vector, vbar_el1);
>   }
>   
> -static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
>   	if (vcpu->arch.guest_vfp_loaded)
> @@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   	write_sysreg(val, cptr_el2);
>   }
>   
> +void __hyp_text __deactivate_traps_nvhe_put(void)
> +{
> +	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
> +}
> +
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +{
> +}
> +
>   static hyp_alternate_select(__activate_traps_arch,
>   			    __activate_traps_nvhe, __activate_traps_vhe,
>   			    ARM64_HAS_VIRT_HOST_EXTN);
> @@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> -	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   	write_sysreg(vectors, vbar_el1);
>   }
>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {
>   	u64 mdcr_el2 = read_sysreg(mdcr_el2);
>   
>   	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
> @@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_RW, hcr_el2);
> -	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>   }
>   
>   static hyp_alternate_select(__deactivate_traps_arch,
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index d225f5797651..7943d5b4dbcb 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
>   		return;
> +	}
>   
>   	__sysreg_save_user_state(host_ctxt);
>   
> @@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	__sysreg_restore_el1_state(guest_ctxt);
>   
>   	vcpu->arch.sysregs_loaded_on_cpu = true;
> +
> +	activate_traps_vhe_load(vcpu);
>   }
>   
>   /**
> @@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>   		vcpu->arch.guest_vfp_loaded = 0;
>   	}
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__deactivate_traps_nvhe_put);
>   		return;
> +	}
> +
> +	deactivate_traps_vhe_put();
>   
>   	__sysreg_save_el1_state(guest_ctxt);
>   	__sysreg_save_user_state(guest_ctxt);
> 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
@ 2018-01-31 12:17     ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-31 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12.01.2018 13:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 3f54c55f77a1..ffd62e31f134 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>   void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>   bool __fpsimd_enabled(void);
>   
> +void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> +void __deactivate_traps_nvhe_put(void);
> +
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> +void deactivate_traps_vhe_put(void);
> +
>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>   void __noreturn __hyp_do_panic(unsigned long, ...);
>   
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c01bcfc3fb52..d14ab9650f81 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,22 +24,25 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
>   {
>   	/*
> -	 * We are about to set CPTR_EL2.TFP to trap all floating point
> -	 * register accesses to EL2, however, the ARM ARM clearly states that
> -	 * traps are only taken to EL2 if the operation would not otherwise
> -	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> -	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> -	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> -	 * it will cause an exception.
> +	 * We are about to trap all floating point register accesses to EL2,
> +	 * however, traps are only taken to EL2 if the operation would not
> +	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> +	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> +	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> +	 * any access to it will cause an exception.
>   	 */
>   	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
>   	    !vcpu->arch.guest_vfp_loaded) {
>   		write_sysreg(1 << 30, fpexc32_el2);
>   		isb();
>   	}
> +}
> +
> +static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
>   
>   	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> @@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
>   	write_sysreg(0, pmuserenr_el0);
>   }
>   
> -static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> @@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	else
>   		val &= ~CPACR_EL1_FPEN;
>   	write_sysreg(val, cpacr_el1);

Giving that you move this code to kvm_vcpu_load_sysregs() I am wondering 
if we have to deactivate FPEN trap here. IIUC, we call 
kvm_vcpu_load_sysregs()->activate_traps_vhe_load() and then 
kvm_vcpu_put_sysregs() by design. So vcpu->arch.guest_vfp_loaded should 
be always 0 here since it is zeroed in kvm_vcpu_put_sysregs(). The same 
for nvhe case below.

I might miss some scenario or future changes you are planning to do. Let 
me know your thoughts.

Thanks,
Tomasz

> +}
>   
> +void deactivate_traps_vhe_put(void)
> +{
> +	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
> +}
> +
> +static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> +{
>   	write_sysreg(__kvm_hyp_vector, vbar_el1);
>   }
>   
> -static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +void __hyp_text __activate_traps_nvhe_load(struct kvm_vcpu *vcpu)
>   {
>   	u64 val;
>   
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	__activate_traps_fpsimd32(vcpu);
> +
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
>   	if (vcpu->arch.guest_vfp_loaded)
> @@ -90,6 +107,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   	write_sysreg(val, cptr_el2);
>   }
>   
> +void __hyp_text __deactivate_traps_nvhe_put(void)
> +{
> +	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
> +}
> +
> +static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> +{
> +}
> +
>   static hyp_alternate_select(__activate_traps_arch,
>   			    __activate_traps_nvhe, __activate_traps_vhe,
>   			    ARM64_HAS_VIRT_HOST_EXTN);
> @@ -111,12 +137,10 @@ static void __hyp_text __deactivate_traps_vhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
> -	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
>   	write_sysreg(vectors, vbar_el1);
>   }
>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {
>   	u64 mdcr_el2 = read_sysreg(mdcr_el2);
>   
>   	mdcr_el2 &= MDCR_EL2_HPMN_MASK;
> @@ -124,7 +148,6 @@ static void __hyp_text __deactivate_traps_nvhe(void)
>   
>   	write_sysreg(mdcr_el2, mdcr_el2);
>   	write_sysreg(HCR_RW, hcr_el2);
> -	write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
>   }
>   
>   static hyp_alternate_select(__deactivate_traps_arch,
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index d225f5797651..7943d5b4dbcb 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -237,8 +237,10 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__activate_traps_nvhe_load, vcpu);
>   		return;
> +	}
>   
>   	__sysreg_save_user_state(host_ctxt);
>   
> @@ -253,6 +255,8 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
>   	__sysreg_restore_el1_state(guest_ctxt);
>   
>   	vcpu->arch.sysregs_loaded_on_cpu = true;
> +
> +	activate_traps_vhe_load(vcpu);
>   }
>   
>   /**
> @@ -282,8 +286,12 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>   		vcpu->arch.guest_vfp_loaded = 0;
>   	}
>   
> -	if (!has_vhe())
> +	if (!has_vhe()) {
> +		kvm_call_hyp(__deactivate_traps_nvhe_put);
>   		return;
> +	}
> +
> +	deactivate_traps_vhe_put();
>   
>   	__sysreg_save_el1_state(guest_ctxt);
>   	__sysreg_save_user_state(guest_ctxt);
> 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-01-31 12:24     ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-31 12:24 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones

On 12.01.2018 13:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 

[...]

>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {

Nit: unrelated change.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
@ 2018-01-31 12:24     ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-01-31 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 12.01.2018 13:07, Christoffer Dall wrote:
> There is no need to enable/disable traps to FP registers on every switch
> to/from the VM, because the host kernel does not use this resource
> without calling vcpu_put.  We can therefore move things around enough
> that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> only program these during vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_hyp.h |  6 +++++
>   arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
>   arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
>   3 files changed, 53 insertions(+), 16 deletions(-)
> 

[...]

>   
> -static void __hyp_text __deactivate_traps_nvhe(void)
> -{
> +static void __hyp_text __deactivate_traps_nvhe(void) {

Nit: unrelated change.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-01-12 12:07 ` Christoffer Dall
@ 2018-02-01 13:57   ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-01 13:57 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

I created simple module for VM kernel. It is spinning on PSCI version 
hypercall to measure the base exit cost as you suggested. Also, I 
measured CPU cycles for each loop and here are my results:

My setup:
1-socket ThunderX2 running VM - 1VCPU

Tested baselines:
a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel 
v4.15-rc3

Module was loaded from VM and the results are presented in [%] relative 
to average CPU cycles spending on PSCI version hypercall for vanilla VHE 
host kernel v4.15-rc3:

              VHE  |  nVHE
=========================
baseline a)  100% |  130%
=========================
baseline a)  36%  |  123%

So I confirm significant performance improvement, especially for VHE 
case. Additionally, I run network throughput tests with vhost-net but 
for that case no differences.

Thanks,
Tomasz

On 12.01.2018 13:07, Christoffer Dall wrote:
> This series redesigns parts of KVM/ARM to optimize the performance on
> VHE systems.  The general approach is to try to do as little work as
> possible when transitioning between the VM and the hypervisor.  This has
> the benefit of lower latency when waiting for interrupts and delivering
> virtual interrupts, and reduces the overhead of emulating behavior and
> I/O in the host kernel.
> 
> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> that can be generally improved.  We then add infrastructure to move more
> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> registers.
> 
> We then introduce a new world-switch function for VHE systems, which we
> can tweak and optimize for VHE systems.  To do that, we rework a lot of
> the system register save/restore handling and emulation code that may
> need access to system registers, so that we can defer as many system
> register save/restore operations to vcpu_load and vcpu_put, and move
> this logic out of the VHE world switch function.
> 
> We then optimize the configuration of traps.  On non-VHE systems, both
> the host and VM kernels run in EL1, but because the host kernel should
> have full access to the underlying hardware, but the VM kernel should
> not, we essentially make the host kernel more privileged than the VM
> kernel despite them both running at the same privilege level by enabling
> VE traps when entering the VM and disabling those traps when exiting the
> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> the hardware (as much as allowed by secure side software), and is
> unaffected by the trap configuration.  That means we can configure the
> traps for VMs running in EL1 once, and don't have to switch them on and
> off for every entry/exit to/from the VM.
> 
> Finally, we improve our VGIC handling by moving all save/restore logic
> out of the VHE world-switch, and we make it possible to truly only
> evaluate if the AP list is empty and not do *any* VGIC work if that is
> the case, and only do the minimal amount of work required in the course
> of the VGIC processing when we have virtual interrupts in flight.
> 
> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> interrupts support series [1], and the first five patches of James' SDEI
> series [2].
> 
> I've given the patches a fair amount of testing on Thunder-X, Mustang,
> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> functionality on the Foundation model, running both 64-bit VMs and
> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> GICv2-on-GICv3.
> 
> The patches are also available in the vhe-optimize-v3 branch on my
> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> prerequisites of this series.
> 
> Changes since v2:
>   - Rebased on v4.15-rc3.
>   - Includes two additional patches that only does vcpu_load after
>     kvm_vcpu_first_run_init and only for KVM_RUN.
>   - Addressed review comments from v2 (detailed changelogs are in the
>     individual patches).
> 
> Thanks,
> -Christoffer
> 
> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> Christoffer Dall (40):
>    KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
>    KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
>    KVM: arm64: Avoid storing the vcpu pointer on the stack
>    KVM: arm64: Rework hyp_panic for VHE and non-VHE
>    KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
>    KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
>    KVM: arm/arm64: Introduce vcpu_el1_is_32bit
>    KVM: arm64: Defer restoring host VFP state to vcpu_put
>    KVM: arm64: Move debug dirty flag calculation out of world switch
>    KVM: arm64: Slightly improve debug save/restore functions
>    KVM: arm64: Improve debug register save/restore flow
>    KVM: arm64: Factor out fault info population and gic workarounds
>    KVM: arm64: Introduce VHE-specific kvm_vcpu_run
>    KVM: arm64: Remove kern_hyp_va() use in VHE switch function
>    KVM: arm64: Don't deactivate VM on VHE systems
>    KVM: arm64: Remove noop calls to timer save/restore from VHE switch
>    KVM: arm64: Move userspace system registers into separate function
>    KVM: arm64: Rewrite sysreg alternatives to static keys
>    KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
>      functions
>    KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
>    KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
>    KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
>    KVM: arm64: Change 32-bit handling of VM system registers
>    KVM: arm64: Rewrite system register accessors to read/write functions
>    KVM: arm64: Introduce framework for accessing deferred sysregs
>    KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
>    KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
>    KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
>      VHE
>    KVM: arm64: Prepare to handle deferred save/restore of 32-bit
>      registers
>    KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
>    KVM: arm64: Move common VHE/non-VHE trap config in separate functions
>    KVM: arm64: Configure FPSIMD traps on vcpu load/put
>    KVM: arm64: Configure c15, PMU, and debug register traps on cpu
>      load/put for VHE
>    KVM: arm64: Separate activate_traps and deactive_traps for VHE and
>      non-VHE
>    KVM: arm/arm64: Get rid of vgic_elrsr
>    KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
>    KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
>    KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
>      VHE
>    KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
>    KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> 
> Shih-Wei Li (1):
>    KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> 
>   arch/arm/include/asm/kvm_asm.h                    |   5 +-
>   arch/arm/include/asm/kvm_emulate.h                |  21 +-
>   arch/arm/include/asm/kvm_host.h                   |   6 +-
>   arch/arm/include/asm/kvm_hyp.h                    |   4 +
>   arch/arm/kvm/emulate.c                            |   4 +-
>   arch/arm/kvm/hyp/Makefile                         |   1 -
>   arch/arm/kvm/hyp/switch.c                         |  16 +-
>   arch/arm64/include/asm/kvm_arm.h                  |   4 +-
>   arch/arm64/include/asm/kvm_asm.h                  |  18 +-
>   arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
>   arch/arm64/include/asm/kvm_host.h                 |  49 ++-
>   arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
>   arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
>   arch/arm64/kernel/asm-offsets.c                   |   2 +
>   arch/arm64/kvm/debug.c                            |  28 +-
>   arch/arm64/kvm/guest.c                            |   3 -
>   arch/arm64/kvm/hyp/Makefile                       |   2 +-
>   arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
>   arch/arm64/kvm/hyp/entry.S                        |   9 +-
>   arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
>   arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
>   arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
>   {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
>   arch/arm64/kvm/inject_fault.c                     |  24 +-
>   arch/arm64/kvm/regmap.c                           |  65 +++-
>   arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
>   arch/arm64/kvm/sys_regs.h                         |   4 +-
>   arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
>   include/kvm/arm_vgic.h                            |   2 -
>   virt/kvm/arm/aarch32.c                            |   2 +-
>   virt/kvm/arm/arch_timer.c                         |   7 -
>   virt/kvm/arm/arm.c                                |  50 ++-
>   virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
>   virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
>   virt/kvm/arm/mmu.c                                |   6 +-
>   virt/kvm/arm/pmu.c                                |  37 +-
>   virt/kvm/arm/vgic/vgic-init.c                     |  11 -
>   virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
>   virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
>   virt/kvm/arm/vgic/vgic.c                          |  21 ++
>   virt/kvm/arm/vgic/vgic.h                          |   3 +
>   41 files changed, 1229 insertions(+), 701 deletions(-)
>   rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-02-01 13:57   ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-01 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

I created simple module for VM kernel. It is spinning on PSCI version 
hypercall to measure the base exit cost as you suggested. Also, I 
measured CPU cycles for each loop and here are my results:

My setup:
1-socket ThunderX2 running VM - 1VCPU

Tested baselines:
a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel 
v4.15-rc3

Module was loaded from VM and the results are presented in [%] relative 
to average CPU cycles spending on PSCI version hypercall for vanilla VHE 
host kernel v4.15-rc3:

              VHE  |  nVHE
=========================
baseline a)  100% |  130%
=========================
baseline a)  36%  |  123%

So I confirm significant performance improvement, especially for VHE 
case. Additionally, I run network throughput tests with vhost-net but 
for that case no differences.

Thanks,
Tomasz

On 12.01.2018 13:07, Christoffer Dall wrote:
> This series redesigns parts of KVM/ARM to optimize the performance on
> VHE systems.  The general approach is to try to do as little work as
> possible when transitioning between the VM and the hypervisor.  This has
> the benefit of lower latency when waiting for interrupts and delivering
> virtual interrupts, and reduces the overhead of emulating behavior and
> I/O in the host kernel.
> 
> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> that can be generally improved.  We then add infrastructure to move more
> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> registers.
> 
> We then introduce a new world-switch function for VHE systems, which we
> can tweak and optimize for VHE systems.  To do that, we rework a lot of
> the system register save/restore handling and emulation code that may
> need access to system registers, so that we can defer as many system
> register save/restore operations to vcpu_load and vcpu_put, and move
> this logic out of the VHE world switch function.
> 
> We then optimize the configuration of traps.  On non-VHE systems, both
> the host and VM kernels run in EL1, but because the host kernel should
> have full access to the underlying hardware, but the VM kernel should
> not, we essentially make the host kernel more privileged than the VM
> kernel despite them both running at the same privilege level by enabling
> VE traps when entering the VM and disabling those traps when exiting the
> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> the hardware (as much as allowed by secure side software), and is
> unaffected by the trap configuration.  That means we can configure the
> traps for VMs running in EL1 once, and don't have to switch them on and
> off for every entry/exit to/from the VM.
> 
> Finally, we improve our VGIC handling by moving all save/restore logic
> out of the VHE world-switch, and we make it possible to truly only
> evaluate if the AP list is empty and not do *any* VGIC work if that is
> the case, and only do the minimal amount of work required in the course
> of the VGIC processing when we have virtual interrupts in flight.
> 
> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> interrupts support series [1], and the first five patches of James' SDEI
> series [2].
> 
> I've given the patches a fair amount of testing on Thunder-X, Mustang,
> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> functionality on the Foundation model, running both 64-bit VMs and
> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> GICv2-on-GICv3.
> 
> The patches are also available in the vhe-optimize-v3 branch on my
> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> prerequisites of this series.
> 
> Changes since v2:
>   - Rebased on v4.15-rc3.
>   - Includes two additional patches that only does vcpu_load after
>     kvm_vcpu_first_run_init and only for KVM_RUN.
>   - Addressed review comments from v2 (detailed changelogs are in the
>     individual patches).
> 
> Thanks,
> -Christoffer
> 
> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> 
> Christoffer Dall (40):
>    KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
>    KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
>    KVM: arm64: Avoid storing the vcpu pointer on the stack
>    KVM: arm64: Rework hyp_panic for VHE and non-VHE
>    KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
>    KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
>    KVM: arm/arm64: Introduce vcpu_el1_is_32bit
>    KVM: arm64: Defer restoring host VFP state to vcpu_put
>    KVM: arm64: Move debug dirty flag calculation out of world switch
>    KVM: arm64: Slightly improve debug save/restore functions
>    KVM: arm64: Improve debug register save/restore flow
>    KVM: arm64: Factor out fault info population and gic workarounds
>    KVM: arm64: Introduce VHE-specific kvm_vcpu_run
>    KVM: arm64: Remove kern_hyp_va() use in VHE switch function
>    KVM: arm64: Don't deactivate VM on VHE systems
>    KVM: arm64: Remove noop calls to timer save/restore from VHE switch
>    KVM: arm64: Move userspace system registers into separate function
>    KVM: arm64: Rewrite sysreg alternatives to static keys
>    KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
>      functions
>    KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
>    KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
>    KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
>    KVM: arm64: Change 32-bit handling of VM system registers
>    KVM: arm64: Rewrite system register accessors to read/write functions
>    KVM: arm64: Introduce framework for accessing deferred sysregs
>    KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
>    KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
>    KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
>      VHE
>    KVM: arm64: Prepare to handle deferred save/restore of 32-bit
>      registers
>    KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
>    KVM: arm64: Move common VHE/non-VHE trap config in separate functions
>    KVM: arm64: Configure FPSIMD traps on vcpu load/put
>    KVM: arm64: Configure c15, PMU, and debug register traps on cpu
>      load/put for VHE
>    KVM: arm64: Separate activate_traps and deactive_traps for VHE and
>      non-VHE
>    KVM: arm/arm64: Get rid of vgic_elrsr
>    KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
>    KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
>    KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
>      VHE
>    KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
>    KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> 
> Shih-Wei Li (1):
>    KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> 
>   arch/arm/include/asm/kvm_asm.h                    |   5 +-
>   arch/arm/include/asm/kvm_emulate.h                |  21 +-
>   arch/arm/include/asm/kvm_host.h                   |   6 +-
>   arch/arm/include/asm/kvm_hyp.h                    |   4 +
>   arch/arm/kvm/emulate.c                            |   4 +-
>   arch/arm/kvm/hyp/Makefile                         |   1 -
>   arch/arm/kvm/hyp/switch.c                         |  16 +-
>   arch/arm64/include/asm/kvm_arm.h                  |   4 +-
>   arch/arm64/include/asm/kvm_asm.h                  |  18 +-
>   arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
>   arch/arm64/include/asm/kvm_host.h                 |  49 ++-
>   arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
>   arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
>   arch/arm64/kernel/asm-offsets.c                   |   2 +
>   arch/arm64/kvm/debug.c                            |  28 +-
>   arch/arm64/kvm/guest.c                            |   3 -
>   arch/arm64/kvm/hyp/Makefile                       |   2 +-
>   arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
>   arch/arm64/kvm/hyp/entry.S                        |   9 +-
>   arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
>   arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
>   arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
>   {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
>   arch/arm64/kvm/inject_fault.c                     |  24 +-
>   arch/arm64/kvm/regmap.c                           |  65 +++-
>   arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
>   arch/arm64/kvm/sys_regs.h                         |   4 +-
>   arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
>   include/kvm/arm_vgic.h                            |   2 -
>   virt/kvm/arm/aarch32.c                            |   2 +-
>   virt/kvm/arm/arch_timer.c                         |   7 -
>   virt/kvm/arm/arm.c                                |  50 ++-
>   virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
>   virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
>   virt/kvm/arm/mmu.c                                |   6 +-
>   virt/kvm/arm/pmu.c                                |  37 +-
>   virt/kvm/arm/vgic/vgic-init.c                     |  11 -
>   virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
>   virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
>   virt/kvm/arm/vgic/vgic.c                          |  21 ++
>   virt/kvm/arm/vgic/vgic.h                          |   3 +
>   41 files changed, 1229 insertions(+), 701 deletions(-)
>   rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-02-01 13:57   ` Tomasz Nowicki
@ 2018-02-01 16:15     ` Yury Norov
  -1 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-02-01 16:15 UTC (permalink / raw)
  To: Tomasz Nowicki
  Cc: kvm, Marc Zyngier, Christoffer Dall, Shih-Wei Li, kvmarm,
	linux-arm-kernel

On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
> Hi Christoffer,
> 
> I created simple module for VM kernel. It is spinning on PSCI version
> hypercall to measure the base exit cost as you suggested. Also, I measured
> CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative to
> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
> kernel v4.15-rc3:
> 
>              VHE  |  nVHE
> =========================
> baseline a)  100% |  130%
> =========================
> baseline a)  36%  |  123%
> 
> So I confirm significant performance improvement, especially for VHE case.
> Additionally, I run network throughput tests with vhost-net but for that
> case no differences.

Hi Tomasz,

Can you share your test?

Yury
 
> Thanks,
> Tomasz
> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >   - Rebased on v4.15-rc3.
> >   - Includes two additional patches that only does vcpu_load after
> >     kvm_vcpu_first_run_init and only for KVM_RUN.
> >   - Addressed review comments from v2 (detailed changelogs are in the
> >     individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > Christoffer Dall (40):
> >    KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
> >    KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
> >    KVM: arm64: Avoid storing the vcpu pointer on the stack
> >    KVM: arm64: Rework hyp_panic for VHE and non-VHE
> >    KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
> >    KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
> >    KVM: arm/arm64: Introduce vcpu_el1_is_32bit
> >    KVM: arm64: Defer restoring host VFP state to vcpu_put
> >    KVM: arm64: Move debug dirty flag calculation out of world switch
> >    KVM: arm64: Slightly improve debug save/restore functions
> >    KVM: arm64: Improve debug register save/restore flow
> >    KVM: arm64: Factor out fault info population and gic workarounds
> >    KVM: arm64: Introduce VHE-specific kvm_vcpu_run
> >    KVM: arm64: Remove kern_hyp_va() use in VHE switch function
> >    KVM: arm64: Don't deactivate VM on VHE systems
> >    KVM: arm64: Remove noop calls to timer save/restore from VHE switch
> >    KVM: arm64: Move userspace system registers into separate function
> >    KVM: arm64: Rewrite sysreg alternatives to static keys
> >    KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
> >      functions
> >    KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
> >    KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
> >    KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
> >    KVM: arm64: Change 32-bit handling of VM system registers
> >    KVM: arm64: Rewrite system register accessors to read/write functions
> >    KVM: arm64: Introduce framework for accessing deferred sysregs
> >    KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
> >    KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
> >    KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
> >      VHE
> >    KVM: arm64: Prepare to handle deferred save/restore of 32-bit
> >      registers
> >    KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
> >    KVM: arm64: Move common VHE/non-VHE trap config in separate functions
> >    KVM: arm64: Configure FPSIMD traps on vcpu load/put
> >    KVM: arm64: Configure c15, PMU, and debug register traps on cpu
> >      load/put for VHE
> >    KVM: arm64: Separate activate_traps and deactive_traps for VHE and
> >      non-VHE
> >    KVM: arm/arm64: Get rid of vgic_elrsr
> >    KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
> >    KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
> >    KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
> >      VHE
> >    KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
> >    KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> > 
> > Shih-Wei Li (1):
> >    KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> > 
> >   arch/arm/include/asm/kvm_asm.h                    |   5 +-
> >   arch/arm/include/asm/kvm_emulate.h                |  21 +-
> >   arch/arm/include/asm/kvm_host.h                   |   6 +-
> >   arch/arm/include/asm/kvm_hyp.h                    |   4 +
> >   arch/arm/kvm/emulate.c                            |   4 +-
> >   arch/arm/kvm/hyp/Makefile                         |   1 -
> >   arch/arm/kvm/hyp/switch.c                         |  16 +-
> >   arch/arm64/include/asm/kvm_arm.h                  |   4 +-
> >   arch/arm64/include/asm/kvm_asm.h                  |  18 +-
> >   arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
> >   arch/arm64/include/asm/kvm_host.h                 |  49 ++-
> >   arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
> >   arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
> >   arch/arm64/kernel/asm-offsets.c                   |   2 +
> >   arch/arm64/kvm/debug.c                            |  28 +-
> >   arch/arm64/kvm/guest.c                            |   3 -
> >   arch/arm64/kvm/hyp/Makefile                       |   2 +-
> >   arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
> >   arch/arm64/kvm/hyp/entry.S                        |   9 +-
> >   arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
> >   arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
> >   arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
> >   {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
> >   arch/arm64/kvm/inject_fault.c                     |  24 +-
> >   arch/arm64/kvm/regmap.c                           |  65 +++-
> >   arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
> >   arch/arm64/kvm/sys_regs.h                         |   4 +-
> >   arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
> >   include/kvm/arm_vgic.h                            |   2 -
> >   virt/kvm/arm/aarch32.c                            |   2 +-
> >   virt/kvm/arm/arch_timer.c                         |   7 -
> >   virt/kvm/arm/arm.c                                |  50 ++-
> >   virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
> >   virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
> >   virt/kvm/arm/mmu.c                                |   6 +-
> >   virt/kvm/arm/pmu.c                                |  37 +-
> >   virt/kvm/arm/vgic/vgic-init.c                     |  11 -
> >   virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
> >   virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
> >   virt/kvm/arm/vgic/vgic.c                          |  21 ++
> >   virt/kvm/arm/vgic/vgic.h                          |   3 +
> >   41 files changed, 1229 insertions(+), 701 deletions(-)
> >   rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> > 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-02-01 16:15     ` Yury Norov
  0 siblings, 0 replies; 223+ messages in thread
From: Yury Norov @ 2018-02-01 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
> Hi Christoffer,
> 
> I created simple module for VM kernel. It is spinning on PSCI version
> hypercall to measure the base exit cost as you suggested. Also, I measured
> CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative to
> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
> kernel v4.15-rc3:
> 
>              VHE  |  nVHE
> =========================
> baseline a)  100% |  130%
> =========================
> baseline a)  36%  |  123%
> 
> So I confirm significant performance improvement, especially for VHE case.
> Additionally, I run network throughput tests with vhost-net but for that
> case no differences.

Hi Tomasz,

Can you share your test?

Yury
 
> Thanks,
> Tomasz
> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
> > This series redesigns parts of KVM/ARM to optimize the performance on
> > VHE systems.  The general approach is to try to do as little work as
> > possible when transitioning between the VM and the hypervisor.  This has
> > the benefit of lower latency when waiting for interrupts and delivering
> > virtual interrupts, and reduces the overhead of emulating behavior and
> > I/O in the host kernel.
> > 
> > Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> > that can be generally improved.  We then add infrastructure to move more
> > logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> > registers.
> > 
> > We then introduce a new world-switch function for VHE systems, which we
> > can tweak and optimize for VHE systems.  To do that, we rework a lot of
> > the system register save/restore handling and emulation code that may
> > need access to system registers, so that we can defer as many system
> > register save/restore operations to vcpu_load and vcpu_put, and move
> > this logic out of the VHE world switch function.
> > 
> > We then optimize the configuration of traps.  On non-VHE systems, both
> > the host and VM kernels run in EL1, but because the host kernel should
> > have full access to the underlying hardware, but the VM kernel should
> > not, we essentially make the host kernel more privileged than the VM
> > kernel despite them both running at the same privilege level by enabling
> > VE traps when entering the VM and disabling those traps when exiting the
> > VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> > the hardware (as much as allowed by secure side software), and is
> > unaffected by the trap configuration.  That means we can configure the
> > traps for VMs running in EL1 once, and don't have to switch them on and
> > off for every entry/exit to/from the VM.
> > 
> > Finally, we improve our VGIC handling by moving all save/restore logic
> > out of the VHE world-switch, and we make it possible to truly only
> > evaluate if the AP list is empty and not do *any* VGIC work if that is
> > the case, and only do the minimal amount of work required in the course
> > of the VGIC processing when we have virtual interrupts in flight.
> > 
> > The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> > interrupts support series [1], and the first five patches of James' SDEI
> > series [2].
> > 
> > I've given the patches a fair amount of testing on Thunder-X, Mustang,
> > Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> > functionality on the Foundation model, running both 64-bit VMs and
> > 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> > GICv2-on-GICv3.
> > 
> > The patches are also available in the vhe-optimize-v3 branch on my
> > kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> > prerequisites of this series.
> > 
> > Changes since v2:
> >   - Rebased on v4.15-rc3.
> >   - Includes two additional patches that only does vcpu_load after
> >     kvm_vcpu_first_run_init and only for KVM_RUN.
> >   - Addressed review comments from v2 (detailed changelogs are in the
> >     individual patches).
> > 
> > Thanks,
> > -Christoffer
> > 
> > [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> > [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> > [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> > 
> > Christoffer Dall (40):
> >    KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
> >    KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
> >    KVM: arm64: Avoid storing the vcpu pointer on the stack
> >    KVM: arm64: Rework hyp_panic for VHE and non-VHE
> >    KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
> >    KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
> >    KVM: arm/arm64: Introduce vcpu_el1_is_32bit
> >    KVM: arm64: Defer restoring host VFP state to vcpu_put
> >    KVM: arm64: Move debug dirty flag calculation out of world switch
> >    KVM: arm64: Slightly improve debug save/restore functions
> >    KVM: arm64: Improve debug register save/restore flow
> >    KVM: arm64: Factor out fault info population and gic workarounds
> >    KVM: arm64: Introduce VHE-specific kvm_vcpu_run
> >    KVM: arm64: Remove kern_hyp_va() use in VHE switch function
> >    KVM: arm64: Don't deactivate VM on VHE systems
> >    KVM: arm64: Remove noop calls to timer save/restore from VHE switch
> >    KVM: arm64: Move userspace system registers into separate function
> >    KVM: arm64: Rewrite sysreg alternatives to static keys
> >    KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
> >      functions
> >    KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
> >    KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
> >    KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
> >    KVM: arm64: Change 32-bit handling of VM system registers
> >    KVM: arm64: Rewrite system register accessors to read/write functions
> >    KVM: arm64: Introduce framework for accessing deferred sysregs
> >    KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
> >    KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
> >    KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
> >      VHE
> >    KVM: arm64: Prepare to handle deferred save/restore of 32-bit
> >      registers
> >    KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
> >    KVM: arm64: Move common VHE/non-VHE trap config in separate functions
> >    KVM: arm64: Configure FPSIMD traps on vcpu load/put
> >    KVM: arm64: Configure c15, PMU, and debug register traps on cpu
> >      load/put for VHE
> >    KVM: arm64: Separate activate_traps and deactive_traps for VHE and
> >      non-VHE
> >    KVM: arm/arm64: Get rid of vgic_elrsr
> >    KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
> >    KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
> >    KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
> >      VHE
> >    KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
> >    KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> > 
> > Shih-Wei Li (1):
> >    KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> > 
> >   arch/arm/include/asm/kvm_asm.h                    |   5 +-
> >   arch/arm/include/asm/kvm_emulate.h                |  21 +-
> >   arch/arm/include/asm/kvm_host.h                   |   6 +-
> >   arch/arm/include/asm/kvm_hyp.h                    |   4 +
> >   arch/arm/kvm/emulate.c                            |   4 +-
> >   arch/arm/kvm/hyp/Makefile                         |   1 -
> >   arch/arm/kvm/hyp/switch.c                         |  16 +-
> >   arch/arm64/include/asm/kvm_arm.h                  |   4 +-
> >   arch/arm64/include/asm/kvm_asm.h                  |  18 +-
> >   arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
> >   arch/arm64/include/asm/kvm_host.h                 |  49 ++-
> >   arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
> >   arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
> >   arch/arm64/kernel/asm-offsets.c                   |   2 +
> >   arch/arm64/kvm/debug.c                            |  28 +-
> >   arch/arm64/kvm/guest.c                            |   3 -
> >   arch/arm64/kvm/hyp/Makefile                       |   2 +-
> >   arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
> >   arch/arm64/kvm/hyp/entry.S                        |   9 +-
> >   arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
> >   arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
> >   arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
> >   {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
> >   arch/arm64/kvm/inject_fault.c                     |  24 +-
> >   arch/arm64/kvm/regmap.c                           |  65 +++-
> >   arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
> >   arch/arm64/kvm/sys_regs.h                         |   4 +-
> >   arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
> >   include/kvm/arm_vgic.h                            |   2 -
> >   virt/kvm/arm/aarch32.c                            |   2 +-
> >   virt/kvm/arm/arch_timer.c                         |   7 -
> >   virt/kvm/arm/arm.c                                |  50 ++-
> >   virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
> >   virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
> >   virt/kvm/arm/mmu.c                                |   6 +-
> >   virt/kvm/arm/pmu.c                                |  37 +-
> >   virt/kvm/arm/vgic/vgic-init.c                     |  11 -
> >   virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
> >   virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
> >   virt/kvm/arm/vgic/vgic.c                          |  21 ++
> >   virt/kvm/arm/vgic/vgic.h                          |   3 +
> >   41 files changed, 1229 insertions(+), 701 deletions(-)
> >   rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> > 

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-02-01 16:15     ` Yury Norov
@ 2018-02-02 10:05       ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-02 10:05 UTC (permalink / raw)
  To: Yury Norov
  Cc: Christoffer Dall, kvmarm, linux-arm-kernel, Marc Zyngier,
	Shih-Wei Li, kvm

On 01.02.2018 17:15, Yury Norov wrote:
> On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
>> Hi Christoffer,
>>
>> I created simple module for VM kernel. It is spinning on PSCI version
>> hypercall to measure the base exit cost as you suggested. Also, I measured
>> CPU cycles for each loop and here are my results:
>>
>> My setup:
>> 1-socket ThunderX2 running VM - 1VCPU
>>
>> Tested baselines:
>> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
>> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
>> v4.15-rc3
>>
>> Module was loaded from VM and the results are presented in [%] relative to
>> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
>> kernel v4.15-rc3:
>>
>>               VHE  |  nVHE
>> =========================
>> baseline a)  100% |  130%
>> =========================
>> baseline a)  36%  |  123%
>>
>> So I confirm significant performance improvement, especially for VHE case.
>> Additionally, I run network throughput tests with vhost-net but for that
>> case no differences.
> 
> Hi Tomasz,
> 
> Can you share your test?
> 

Yes:

#include <linux/arm-smccc.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/psci.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>

#include <uapi/linux/psci.h>

#define SAMPLE_N	(10000UL)
#define SAMPLES		(500)
#define CPU_PINNED	(10)

static struct task_struct *kvm_bench_task;

static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
			unsigned long arg0, unsigned long arg1,
			unsigned long arg2)
{
	struct arm_smccc_res res;

	arm_smccc_hvc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res);
	return res.a0;
}

static u32 psci_get_version(void)
{
	return __invoke_psci_fn_hvc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
}

static inline u64 get_cycles_custom(void)
{
	register u64 c;
	__asm__ volatile("mrs %0, cntvct_el0" : "=r"(c));
	return c;
}

static int kvm_bench_kthread(void *none)
{
	int test_iter, out = SAMPLES;
	u64 time_before, time;
         u32 ver = psci_get_version();

	printk(KERN_INFO "Starting kvm exit cost test, using PSCI get version 
hypercall");
	printk(KERN_INFO "Obtained PSCIv%d.%d\n", PSCI_VERSION_MAJOR(ver),
	       PSCI_VERSION_MINOR(ver));

	for (test_iter = 0;; test_iter++) {
		if (!(test_iter % SAMPLE_N)) {
			time_before = get_cycles_custom();
		}

		psci_get_version();

		if (!(test_iter % SAMPLE_N)) {
			while (!out--) {
				kvm_bench_task = NULL;
				do_exit(0);
			}
			time = get_cycles_custom() - time_before;
			printk(KERN_INFO "iter takes %llu cycles. \n", time);
			if (kthread_should_stop())
				break;
			schedule();
		}
	}

	return 0;
}

static int __init kvm_bench_init(void)
{
	int err;

	printk(KERN_INFO "KVM exit cost benchmark\n");

	kvm_bench_task = kthread_create(kvm_bench_kthread, NULL, "kvm_test");
	if(IS_ERR(kvm_bench_task)) {
		printk(KERN_INFO "Unable to start thread.\n");
		err = PTR_ERR(kvm_bench_task);
		return err;
	}
	kthread_bind(kvm_bench_task, CPU_PINNED);
	wake_up_process(kvm_bench_task);
	return 0;
}

static void __exit kvm_bench_cleanup(void)
{
	printk(KERN_INFO "KVM benchmark cleaning up\n");
	if (kvm_bench_task)
		kthread_stop(kvm_bench_task);
}

module_init(kvm_bench_init);
module_exit(kvm_bench_cleanup);


Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-02-02 10:05       ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-02 10:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 01.02.2018 17:15, Yury Norov wrote:
> On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
>> Hi Christoffer,
>>
>> I created simple module for VM kernel. It is spinning on PSCI version
>> hypercall to measure the base exit cost as you suggested. Also, I measured
>> CPU cycles for each loop and here are my results:
>>
>> My setup:
>> 1-socket ThunderX2 running VM - 1VCPU
>>
>> Tested baselines:
>> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
>> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
>> v4.15-rc3
>>
>> Module was loaded from VM and the results are presented in [%] relative to
>> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
>> kernel v4.15-rc3:
>>
>>               VHE  |  nVHE
>> =========================
>> baseline a)  100% |  130%
>> =========================
>> baseline a)  36%  |  123%
>>
>> So I confirm significant performance improvement, especially for VHE case.
>> Additionally, I run network throughput tests with vhost-net but for that
>> case no differences.
> 
> Hi Tomasz,
> 
> Can you share your test?
> 

Yes:

#include <linux/arm-smccc.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/psci.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>

#include <uapi/linux/psci.h>

#define SAMPLE_N	(10000UL)
#define SAMPLES		(500)
#define CPU_PINNED	(10)

static struct task_struct *kvm_bench_task;

static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
			unsigned long arg0, unsigned long arg1,
			unsigned long arg2)
{
	struct arm_smccc_res res;

	arm_smccc_hvc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res);
	return res.a0;
}

static u32 psci_get_version(void)
{
	return __invoke_psci_fn_hvc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
}

static inline u64 get_cycles_custom(void)
{
	register u64 c;
	__asm__ volatile("mrs %0, cntvct_el0" : "=r"(c));
	return c;
}

static int kvm_bench_kthread(void *none)
{
	int test_iter, out = SAMPLES;
	u64 time_before, time;
         u32 ver = psci_get_version();

	printk(KERN_INFO "Starting kvm exit cost test, using PSCI get version 
hypercall");
	printk(KERN_INFO "Obtained PSCIv%d.%d\n", PSCI_VERSION_MAJOR(ver),
	       PSCI_VERSION_MINOR(ver));

	for (test_iter = 0;; test_iter++) {
		if (!(test_iter % SAMPLE_N)) {
			time_before = get_cycles_custom();
		}

		psci_get_version();

		if (!(test_iter % SAMPLE_N)) {
			while (!out--) {
				kvm_bench_task = NULL;
				do_exit(0);
			}
			time = get_cycles_custom() - time_before;
			printk(KERN_INFO "iter takes %llu cycles. \n", time);
			if (kthread_should_stop())
				break;
			schedule();
		}
	}

	return 0;
}

static int __init kvm_bench_init(void)
{
	int err;

	printk(KERN_INFO "KVM exit cost benchmark\n");

	kvm_bench_task = kthread_create(kvm_bench_kthread, NULL, "kvm_test");
	if(IS_ERR(kvm_bench_task)) {
		printk(KERN_INFO "Unable to start thread.\n");
		err = PTR_ERR(kvm_bench_task);
		return err;
	}
	kthread_bind(kvm_bench_task, CPU_PINNED);
	wake_up_process(kvm_bench_task);
	return 0;
}

static void __exit kvm_bench_cleanup(void)
{
	printk(KERN_INFO "KVM benchmark cleaning up\n");
	if (kvm_bench_task)
		kthread_stop(kvm_bench_task);
}

module_init(kvm_bench_init);
module_exit(kvm_bench_cleanup);


Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-02-01 13:57   ` Tomasz Nowicki
@ 2018-02-02 10:07     ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-02 10:07 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

On 01.02.2018 14:57, Tomasz Nowicki wrote:
> Hi Christoffer,
> 
> I created simple module for VM kernel. It is spinning on PSCI version 
> hypercall to measure the base exit cost as you suggested. Also, I 
> measured CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel 
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative 
> to average CPU cycles spending on PSCI version hypercall for vanilla VHE 
> host kernel v4.15-rc3:
> 
>               VHE  |  nVHE
> =========================
> baseline a)  100% |  130%
> =========================
> baseline a)  36%  |  123%

My apologise, second raw obviously is for baseline b).

Tomasz

> 
> So I confirm significant performance improvement, especially for VHE 
> case. Additionally, I run network throughput tests with vhost-net but 
> for that case no differences.
> 
> Thanks,
> Tomasz
> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
>> This series redesigns parts of KVM/ARM to optimize the performance on
>> VHE systems.  The general approach is to try to do as little work as
>> possible when transitioning between the VM and the hypervisor.  This has
>> the benefit of lower latency when waiting for interrupts and delivering
>> virtual interrupts, and reduces the overhead of emulating behavior and
>> I/O in the host kernel.
>>
>> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
>> that can be generally improved.  We then add infrastructure to move more
>> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
>> registers.
>>
>> We then introduce a new world-switch function for VHE systems, which we
>> can tweak and optimize for VHE systems.  To do that, we rework a lot of
>> the system register save/restore handling and emulation code that may
>> need access to system registers, so that we can defer as many system
>> register save/restore operations to vcpu_load and vcpu_put, and move
>> this logic out of the VHE world switch function.
>>
>> We then optimize the configuration of traps.  On non-VHE systems, both
>> the host and VM kernels run in EL1, but because the host kernel should
>> have full access to the underlying hardware, but the VM kernel should
>> not, we essentially make the host kernel more privileged than the VM
>> kernel despite them both running at the same privilege level by enabling
>> VE traps when entering the VM and disabling those traps when exiting the
>> VM.  On VHE systems, the host kernel runs in EL2 and has full access to
>> the hardware (as much as allowed by secure side software), and is
>> unaffected by the trap configuration.  That means we can configure the
>> traps for VMs running in EL1 once, and don't have to switch them on and
>> off for every entry/exit to/from the VM.
>>
>> Finally, we improve our VGIC handling by moving all save/restore logic
>> out of the VHE world-switch, and we make it possible to truly only
>> evaluate if the AP list is empty and not do *any* VGIC work if that is
>> the case, and only do the minimal amount of work required in the course
>> of the VGIC processing when we have virtual interrupts in flight.
>>
>> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
>> interrupts support series [1], and the first five patches of James' SDEI
>> series [2].
>>
>> I've given the patches a fair amount of testing on Thunder-X, Mustang,
>> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
>> functionality on the Foundation model, running both 64-bit VMs and
>> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
>> GICv2-on-GICv3.
>>
>> The patches are also available in the vhe-optimize-v3 branch on my
>> kernel.org repository [3].  The vhe-optimize-v3-base branch contains
>> prerequisites of this series.
>>
>> Changes since v2:
>>   - Rebased on v4.15-rc3.
>>   - Includes two additional patches that only does vcpu_load after
>>     kvm_vcpu_first_run_init and only for KVM_RUN.
>>   - Addressed review comments from v2 (detailed changelogs are in the
>>     individual patches).
>>
>> Thanks,
>> -Christoffer
>>
>> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git 
>> level-mapped-v9
>> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
>> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git 
>> vhe-optimize-v3
>>
>> Christoffer Dall (40):
>>    KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
>>    KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
>>    KVM: arm64: Avoid storing the vcpu pointer on the stack
>>    KVM: arm64: Rework hyp_panic for VHE and non-VHE
>>    KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
>>    KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
>>    KVM: arm/arm64: Introduce vcpu_el1_is_32bit
>>    KVM: arm64: Defer restoring host VFP state to vcpu_put
>>    KVM: arm64: Move debug dirty flag calculation out of world switch
>>    KVM: arm64: Slightly improve debug save/restore functions
>>    KVM: arm64: Improve debug register save/restore flow
>>    KVM: arm64: Factor out fault info population and gic workarounds
>>    KVM: arm64: Introduce VHE-specific kvm_vcpu_run
>>    KVM: arm64: Remove kern_hyp_va() use in VHE switch function
>>    KVM: arm64: Don't deactivate VM on VHE systems
>>    KVM: arm64: Remove noop calls to timer save/restore from VHE switch
>>    KVM: arm64: Move userspace system registers into separate function
>>    KVM: arm64: Rewrite sysreg alternatives to static keys
>>    KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
>>      functions
>>    KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
>>    KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
>>    KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
>>    KVM: arm64: Change 32-bit handling of VM system registers
>>    KVM: arm64: Rewrite system register accessors to read/write functions
>>    KVM: arm64: Introduce framework for accessing deferred sysregs
>>    KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
>>    KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
>>    KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
>>      VHE
>>    KVM: arm64: Prepare to handle deferred save/restore of 32-bit
>>      registers
>>    KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
>>    KVM: arm64: Move common VHE/non-VHE trap config in separate functions
>>    KVM: arm64: Configure FPSIMD traps on vcpu load/put
>>    KVM: arm64: Configure c15, PMU, and debug register traps on cpu
>>      load/put for VHE
>>    KVM: arm64: Separate activate_traps and deactive_traps for VHE and
>>      non-VHE
>>    KVM: arm/arm64: Get rid of vgic_elrsr
>>    KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
>>    KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
>>    KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
>>      VHE
>>    KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
>>    KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
>>
>> Shih-Wei Li (1):
>>    KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
>>
>>   arch/arm/include/asm/kvm_asm.h                    |   5 +-
>>   arch/arm/include/asm/kvm_emulate.h                |  21 +-
>>   arch/arm/include/asm/kvm_host.h                   |   6 +-
>>   arch/arm/include/asm/kvm_hyp.h                    |   4 +
>>   arch/arm/kvm/emulate.c                            |   4 +-
>>   arch/arm/kvm/hyp/Makefile                         |   1 -
>>   arch/arm/kvm/hyp/switch.c                         |  16 +-
>>   arch/arm64/include/asm/kvm_arm.h                  |   4 +-
>>   arch/arm64/include/asm/kvm_asm.h                  |  18 +-
>>   arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
>>   arch/arm64/include/asm/kvm_host.h                 |  49 ++-
>>   arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
>>   arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
>>   arch/arm64/kernel/asm-offsets.c                   |   2 +
>>   arch/arm64/kvm/debug.c                            |  28 +-
>>   arch/arm64/kvm/guest.c                            |   3 -
>>   arch/arm64/kvm/hyp/Makefile                       |   2 +-
>>   arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
>>   arch/arm64/kvm/hyp/entry.S                        |   9 +-
>>   arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
>>   arch/arm64/kvm/hyp/switch.c                       | 404 
>> +++++++++++++---------
>>   arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
>>   {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
>>   arch/arm64/kvm/inject_fault.c                     |  24 +-
>>   arch/arm64/kvm/regmap.c                           |  65 +++-
>>   arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
>>   arch/arm64/kvm/sys_regs.h                         |   4 +-
>>   arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
>>   include/kvm/arm_vgic.h                            |   2 -
>>   virt/kvm/arm/aarch32.c                            |   2 +-
>>   virt/kvm/arm/arch_timer.c                         |   7 -
>>   virt/kvm/arm/arm.c                                |  50 ++-
>>   virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
>>   virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
>>   virt/kvm/arm/mmu.c                                |   6 +-
>>   virt/kvm/arm/pmu.c                                |  37 +-
>>   virt/kvm/arm/vgic/vgic-init.c                     |  11 -
>>   virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
>>   virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
>>   virt/kvm/arm/vgic/vgic.c                          |  21 ++
>>   virt/kvm/arm/vgic/vgic.h                          |   3 +
>>   41 files changed, 1229 insertions(+), 701 deletions(-)
>>   rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
>>

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-02-02 10:07     ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-02 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 01.02.2018 14:57, Tomasz Nowicki wrote:
> Hi Christoffer,
> 
> I created simple module for VM kernel. It is spinning on PSCI version 
> hypercall to measure the base exit cost as you suggested. Also, I 
> measured CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel 
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative 
> to average CPU cycles spending on PSCI version hypercall for vanilla VHE 
> host kernel v4.15-rc3:
> 
>  ???????????? VHE? |? nVHE
> =========================
> baseline a)? 100% |? 130%
> =========================
> baseline a)? 36%? |? 123%

My apologise, second raw obviously is for baseline b).

Tomasz

> 
> So I confirm significant performance improvement, especially for VHE 
> case. Additionally, I run network throughput tests with vhost-net but 
> for that case no differences.
> 
> Thanks,
> Tomasz
> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
>> This series redesigns parts of KVM/ARM to optimize the performance on
>> VHE systems.? The general approach is to try to do as little work as
>> possible when transitioning between the VM and the hypervisor.? This has
>> the benefit of lower latency when waiting for interrupts and delivering
>> virtual interrupts, and reduces the overhead of emulating behavior and
>> I/O in the host kernel.
>>
>> Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
>> that can be generally improved.? We then add infrastructure to move more
>> logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
>> registers.
>>
>> We then introduce a new world-switch function for VHE systems, which we
>> can tweak and optimize for VHE systems.? To do that, we rework a lot of
>> the system register save/restore handling and emulation code that may
>> need access to system registers, so that we can defer as many system
>> register save/restore operations to vcpu_load and vcpu_put, and move
>> this logic out of the VHE world switch function.
>>
>> We then optimize the configuration of traps.? On non-VHE systems, both
>> the host and VM kernels run in EL1, but because the host kernel should
>> have full access to the underlying hardware, but the VM kernel should
>> not, we essentially make the host kernel more privileged than the VM
>> kernel despite them both running at the same privilege level by enabling
>> VE traps when entering the VM and disabling those traps when exiting the
>> VM.? On VHE systems, the host kernel runs in EL2 and has full access to
>> the hardware (as much as allowed by secure side software), and is
>> unaffected by the trap configuration.? That means we can configure the
>> traps for VMs running in EL1 once, and don't have to switch them on and
>> off for every entry/exit to/from the VM.
>>
>> Finally, we improve our VGIC handling by moving all save/restore logic
>> out of the VHE world-switch, and we make it possible to truly only
>> evaluate if the AP list is empty and not do *any* VGIC work if that is
>> the case, and only do the minimal amount of work required in the course
>> of the VGIC processing when we have virtual interrupts in flight.
>>
>> The patches are based on v4.15-rc3, v9 of the level-triggered mapped
>> interrupts support series [1], and the first five patches of James' SDEI
>> series [2].
>>
>> I've given the patches a fair amount of testing on Thunder-X, Mustang,
>> Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
>> functionality on the Foundation model, running both 64-bit VMs and
>> 32-bit VMs side-by-side and using both GICv3-on-GICv3 and
>> GICv2-on-GICv3.
>>
>> The patches are also available in the vhe-optimize-v3 branch on my
>> kernel.org repository [3].? The vhe-optimize-v3-base branch contains
>> prerequisites of this series.
>>
>> Changes since v2:
>> ? - Rebased on v4.15-rc3.
>> ? - Includes two additional patches that only does vcpu_load after
>> ??? kvm_vcpu_first_run_init and only for KVM_RUN.
>> ? - Addressed review comments from v2 (detailed changelogs are in the
>> ??? individual patches).
>>
>> Thanks,
>> -Christoffer
>>
>> [1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git 
>> level-mapped-v9
>> [2]: git://linux-arm.org/linux-jm.git sdei/v5/base
>> [3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git 
>> vhe-optimize-v3
>>
>> Christoffer Dall (40):
>> ?? KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
>> ?? KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
>> ?? KVM: arm64: Avoid storing the vcpu pointer on the stack
>> ?? KVM: arm64: Rework hyp_panic for VHE and non-VHE
>> ?? KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
>> ?? KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
>> ?? KVM: arm/arm64: Introduce vcpu_el1_is_32bit
>> ?? KVM: arm64: Defer restoring host VFP state to vcpu_put
>> ?? KVM: arm64: Move debug dirty flag calculation out of world switch
>> ?? KVM: arm64: Slightly improve debug save/restore functions
>> ?? KVM: arm64: Improve debug register save/restore flow
>> ?? KVM: arm64: Factor out fault info population and gic workarounds
>> ?? KVM: arm64: Introduce VHE-specific kvm_vcpu_run
>> ?? KVM: arm64: Remove kern_hyp_va() use in VHE switch function
>> ?? KVM: arm64: Don't deactivate VM on VHE systems
>> ?? KVM: arm64: Remove noop calls to timer save/restore from VHE switch
>> ?? KVM: arm64: Move userspace system registers into separate function
>> ?? KVM: arm64: Rewrite sysreg alternatives to static keys
>> ?? KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
>> ???? functions
>> ?? KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
>> ?? KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
>> ?? KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
>> ?? KVM: arm64: Change 32-bit handling of VM system registers
>> ?? KVM: arm64: Rewrite system register accessors to read/write functions
>> ?? KVM: arm64: Introduce framework for accessing deferred sysregs
>> ?? KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
>> ?? KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
>> ?? KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
>> ???? VHE
>> ?? KVM: arm64: Prepare to handle deferred save/restore of 32-bit
>> ???? registers
>> ?? KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
>> ?? KVM: arm64: Move common VHE/non-VHE trap config in separate functions
>> ?? KVM: arm64: Configure FPSIMD traps on vcpu load/put
>> ?? KVM: arm64: Configure c15, PMU, and debug register traps on cpu
>> ???? load/put for VHE
>> ?? KVM: arm64: Separate activate_traps and deactive_traps for VHE and
>> ???? non-VHE
>> ?? KVM: arm/arm64: Get rid of vgic_elrsr
>> ?? KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
>> ?? KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
>> ?? KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
>> ???? VHE
>> ?? KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
>> ?? KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
>>
>> Shih-Wei Li (1):
>> ?? KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
>>
>> ? arch/arm/include/asm/kvm_asm.h??????????????????? |?? 5 +-
>> ? arch/arm/include/asm/kvm_emulate.h??????????????? |? 21 +-
>> ? arch/arm/include/asm/kvm_host.h?????????????????? |?? 6 +-
>> ? arch/arm/include/asm/kvm_hyp.h??????????????????? |?? 4 +
>> ? arch/arm/kvm/emulate.c??????????????????????????? |?? 4 +-
>> ? arch/arm/kvm/hyp/Makefile???????????????????????? |?? 1 -
>> ? arch/arm/kvm/hyp/switch.c???????????????????????? |? 16 +-
>> ? arch/arm64/include/asm/kvm_arm.h????????????????? |?? 4 +-
>> ? arch/arm64/include/asm/kvm_asm.h????????????????? |? 18 +-
>> ? arch/arm64/include/asm/kvm_emulate.h????????????? |? 74 +++-
>> ? arch/arm64/include/asm/kvm_host.h???????????????? |? 49 ++-
>> ? arch/arm64/include/asm/kvm_hyp.h????????????????? |? 32 +-
>> ? arch/arm64/include/asm/kvm_mmu.h????????????????? |?? 2 +-
>> ? arch/arm64/kernel/asm-offsets.c?????????????????? |?? 2 +
>> ? arch/arm64/kvm/debug.c??????????????????????????? |? 28 +-
>> ? arch/arm64/kvm/guest.c??????????????????????????? |?? 3 -
>> ? arch/arm64/kvm/hyp/Makefile?????????????????????? |?? 2 +-
>> ? arch/arm64/kvm/hyp/debug-sr.c???????????????????? |? 88 +++--
>> ? arch/arm64/kvm/hyp/entry.S??????????????????????? |?? 9 +-
>> ? arch/arm64/kvm/hyp/hyp-entry.S??????????????????? |? 41 +--
>> ? arch/arm64/kvm/hyp/switch.c?????????????????????? | 404 
>> +++++++++++++---------
>> ? arch/arm64/kvm/hyp/sysreg-sr.c??????????????????? | 192 ++++++++--
>> ? {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |? 81 -----
>> ? arch/arm64/kvm/inject_fault.c???????????????????? |? 24 +-
>> ? arch/arm64/kvm/regmap.c?????????????????????????? |? 65 +++-
>> ? arch/arm64/kvm/sys_regs.c???????????????????????? | 247 +++++++++++--
>> ? arch/arm64/kvm/sys_regs.h???????????????????????? |?? 4 +-
>> ? arch/arm64/kvm/sys_regs_generic_v8.c????????????? |?? 4 +-
>> ? include/kvm/arm_vgic.h??????????????????????????? |?? 2 -
>> ? virt/kvm/arm/aarch32.c??????????????????????????? |?? 2 +-
>> ? virt/kvm/arm/arch_timer.c???????????????????????? |?? 7 -
>> ? virt/kvm/arm/arm.c??????????????????????????????? |? 50 ++-
>> ? virt/kvm/arm/hyp/timer-sr.c?????????????????????? |? 44 +--
>> ? virt/kvm/arm/hyp/vgic-v3-sr.c???????????????????? | 244 +++++++------
>> ? virt/kvm/arm/mmu.c??????????????????????????????? |?? 6 +-
>> ? virt/kvm/arm/pmu.c??????????????????????????????? |? 37 +-
>> ? virt/kvm/arm/vgic/vgic-init.c???????????????????? |? 11 -
>> ? virt/kvm/arm/vgic/vgic-v2.c?????????????????????? |? 61 +++-
>> ? virt/kvm/arm/vgic/vgic-v3.c?????????????????????? |? 12 +-
>> ? virt/kvm/arm/vgic/vgic.c????????????????????????? |? 21 ++
>> ? virt/kvm/arm/vgic/vgic.h????????????????????????? |?? 3 +
>> ? 41 files changed, 1229 insertions(+), 701 deletions(-)
>> ? rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
>>

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
  2018-01-31 12:17     ` Tomasz Nowicki
@ 2018-02-05 10:06       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-05 10:06 UTC (permalink / raw)
  To: Tomasz Nowicki
  Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones

Hi Tomasz,

On Wed, Jan 31, 2018 at 01:17:36PM +0100, Tomasz Nowicki wrote:
> On 12.01.2018 13:07, Christoffer Dall wrote:
> >There is no need to enable/disable traps to FP registers on every switch
> >to/from the VM, because the host kernel does not use this resource
> >without calling vcpu_put.  We can therefore move things around enough
> >that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> >only program these during vcpu load/put.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_hyp.h |  6 +++++
> >  arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
> >  arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
> >  3 files changed, 53 insertions(+), 16 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >index 3f54c55f77a1..ffd62e31f134 100644
> >--- a/arch/arm64/include/asm/kvm_hyp.h
> >+++ b/arch/arm64/include/asm/kvm_hyp.h
> >@@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
> >  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
> >  bool __fpsimd_enabled(void);
> >+void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> >+void __deactivate_traps_nvhe_put(void);
> >+
> >+void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> >+void deactivate_traps_vhe_put(void);
> >+
> >  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> >  void __noreturn __hyp_do_panic(unsigned long, ...);
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index c01bcfc3fb52..d14ab9650f81 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -24,22 +24,25 @@
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> >-static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> >+static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
> >  {
> >  	/*
> >-	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >-	 * register accesses to EL2, however, the ARM ARM clearly states that
> >-	 * traps are only taken to EL2 if the operation would not otherwise
> >-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> >-	 * it will cause an exception.
> >+	 * We are about to trap all floating point register accesses to EL2,
> >+	 * however, traps are only taken to EL2 if the operation would not
> >+	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> >+	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> >+	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> >+	 * any access to it will cause an exception.
> >  	 */
> >  	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
> >  	    !vcpu->arch.guest_vfp_loaded) {
> >  		write_sysreg(1 << 30, fpexc32_el2);
> >  		isb();
> >  	}
> >+}
> >+
> >+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> >+{
> >  	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
> >  	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> >@@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
> >  	write_sysreg(0, pmuserenr_el0);
> >  }
> >-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> >+void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
> >  {
> >  	u64 val;
> >+	__activate_traps_fpsimd32(vcpu);
> >+
> >  	val = read_sysreg(cpacr_el1);
> >  	val |= CPACR_EL1_TTA;
> >  	val &= ~CPACR_EL1_ZEN;
> >@@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  	else
> >  		val &= ~CPACR_EL1_FPEN;
> >  	write_sysreg(val, cpacr_el1);
> 
> Giving that you move this code to kvm_vcpu_load_sysregs() I am wondering if
> we have to deactivate FPEN trap here. IIUC, we call
> kvm_vcpu_load_sysregs()->activate_traps_vhe_load() and then
> kvm_vcpu_put_sysregs() by design. So vcpu->arch.guest_vfp_loaded should be
> always 0 here since it is zeroed in kvm_vcpu_put_sysregs(). The same for
> nvhe case below.
> 

You're absolutely right, we can enable the trapping unconditionally on
this path.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put
@ 2018-02-05 10:06       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-05 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tomasz,

On Wed, Jan 31, 2018 at 01:17:36PM +0100, Tomasz Nowicki wrote:
> On 12.01.2018 13:07, Christoffer Dall wrote:
> >There is no need to enable/disable traps to FP registers on every switch
> >to/from the VM, because the host kernel does not use this resource
> >without calling vcpu_put.  We can therefore move things around enough
> >that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
> >only program these during vcpu load/put.
> >
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_hyp.h |  6 +++++
> >  arch/arm64/kvm/hyp/switch.c      | 51 +++++++++++++++++++++++++++++-----------
> >  arch/arm64/kvm/hyp/sysreg-sr.c   | 12 ++++++++--
> >  3 files changed, 53 insertions(+), 16 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> >index 3f54c55f77a1..ffd62e31f134 100644
> >--- a/arch/arm64/include/asm/kvm_hyp.h
> >+++ b/arch/arm64/include/asm/kvm_hyp.h
> >@@ -148,6 +148,12 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
> >  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
> >  bool __fpsimd_enabled(void);
> >+void __activate_traps_nvhe_load(struct kvm_vcpu *vcpu);
> >+void __deactivate_traps_nvhe_put(void);
> >+
> >+void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
> >+void deactivate_traps_vhe_put(void);
> >+
> >  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> >  void __noreturn __hyp_do_panic(unsigned long, ...);
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index c01bcfc3fb52..d14ab9650f81 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -24,22 +24,25 @@
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> >-static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> >+static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
> >  {
> >  	/*
> >-	 * We are about to set CPTR_EL2.TFP to trap all floating point
> >-	 * register accesses to EL2, however, the ARM ARM clearly states that
> >-	 * traps are only taken to EL2 if the operation would not otherwise
> >-	 * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> >-	 * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> >-	 * If FP/ASIMD is not implemented, FPEXC is UNDEFINED and any access to
> >-	 * it will cause an exception.
> >+	 * We are about to trap all floating point register accesses to EL2,
> >+	 * however, traps are only taken to EL2 if the operation would not
> >+	 * otherwise trap to EL1.  Therefore, always make sure that for 32-bit
> >+	 * guests, we set FPEXC.EN to prevent traps to EL1, when setting the
> >+	 * TFP bit.  If FP/ASIMD is not implemented, FPEXC is UNDEFINED and
> >+	 * any access to it will cause an exception.
> >  	 */
> >  	if (vcpu_el1_is_32bit(vcpu) && system_supports_fpsimd() &&
> >  	    !vcpu->arch.guest_vfp_loaded) {
> >  		write_sysreg(1 << 30, fpexc32_el2);
> >  		isb();
> >  	}
> >+}
> >+
> >+static void __hyp_text __activate_traps_common(struct kvm_vcpu *vcpu)
> >+{
> >  	write_sysreg(vcpu->arch.hcr_el2, hcr_el2);
> >  	/* Trap on AArch32 cp15 c15 (impdef sysregs) accesses (EL1 or EL0) */
> >@@ -61,10 +64,12 @@ static void __hyp_text __deactivate_traps_common(void)
> >  	write_sysreg(0, pmuserenr_el0);
> >  }
> >-static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> >+void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
> >  {
> >  	u64 val;
> >+	__activate_traps_fpsimd32(vcpu);
> >+
> >  	val = read_sysreg(cpacr_el1);
> >  	val |= CPACR_EL1_TTA;
> >  	val &= ~CPACR_EL1_ZEN;
> >@@ -73,14 +78,26 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  	else
> >  		val &= ~CPACR_EL1_FPEN;
> >  	write_sysreg(val, cpacr_el1);
> 
> Giving that you move this code to kvm_vcpu_load_sysregs() I am wondering if
> we have to deactivate FPEN trap here. IIUC, we call
> kvm_vcpu_load_sysregs()->activate_traps_vhe_load() and then
> kvm_vcpu_put_sysregs() by design. So vcpu->arch.guest_vfp_loaded should be
> always 0 here since it is zeroed in kvm_vcpu_put_sysregs(). The same for
> nvhe case below.
> 

You're absolutely right, we can enable the trapping unconditionally on
this path.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-05 12:32     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 12:32 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> Calling vcpu_load() registers preempt notifiers for this vcpu and calls
> kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
> lifting on arm/arm64 and will try to do things such as enabling the
> virtual timer and setting us up to handle interrupts from the timer
> hardware.
> 
> Loading state onto hardware registers and enabling hardware to signal
> interrupts can be problematic when we're not actually about to run the
> VCPU, because it makes it difficult to establish the right context when
> handling interrupts from the timer, and it makes the register access
> code difficult to reason about.
> 
> Luckily, now when we call vcpu_load in each ioctl implementation, we can
> simply remove the call from the non-KVM_RUN vcpu ioctls, and our
> kvm_arch_vcpu_load() is only used for loading vcpu content to the
> physical CPU when we're actually going to run the vcpu.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
@ 2018-02-05 12:32     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> Calling vcpu_load() registers preempt notifiers for this vcpu and calls
> kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
> lifting on arm/arm64 and will try to do things such as enabling the
> virtual timer and setting us up to handle interrupts from the timer
> hardware.
> 
> Loading state onto hardware registers and enabling hardware to signal
> interrupts can be problematic when we're not actually about to run the
> VCPU, because it makes it difficult to establish the right context when
> handling interrupts from the timer, and it makes the register access
> code difficult to reason about.
> 
> Luckily, now when we call vcpu_load in each ioctl implementation, we can
> simply remove the call from the non-KVM_RUN vcpu ioctls, and our
> kvm_arch_vcpu_load() is only used for loading vcpu content to the
> physical CPU when we're actually going to run the vcpu.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-05 13:29     ` Tomasz Nowicki
  -1 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-05 13:29 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Shih-Wei Li, Andrew Jones

Hi Christoffer,

On 12.01.2018 13:07, Christoffer Dall wrote:
> We can finally get completely rid of any calls to the VGICv3
> save/restore functions when the AP lists are empty on VHE systems.  This
> requires carefully factoring out trap configuration from saving and
> restoring state, and carefully choosing what to do on the VHE and
> non-VHE path.
> 
> One of the challenges is that we cannot save/restore the VMCR lazily
> because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
> emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
> up being delivered as FIQ.
> 
> To solve this problem, and still provide fast performance in the fast
> path of exiting a VM when no interrupts are pending (which also
> optimized the latency for actually delivering virtual interrupts coming
> from physical interrupts), we orchestrate a dance of only doing the
> activate/deactivate traps in vgic load/put for VHE systems (which can
> have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
> configuration on every round-trip on non-VHE systems.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm/include/asm/kvm_hyp.h   |   2 +
>   arch/arm/kvm/hyp/switch.c        |   8 ++-
>   arch/arm64/include/asm/kvm_hyp.h |   2 +
>   arch/arm64/kvm/hyp/switch.c      |   8 ++-
>   virt/kvm/arm/hyp/vgic-v3-sr.c    | 121 +++++++++++++++++++++++++--------------
>   virt/kvm/arm/vgic/vgic-v3.c      |   6 ++
>   virt/kvm/arm/vgic/vgic.c         |   7 +--
>   7 files changed, 103 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index b3dd4f4304f5..d01676e5b816 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>   
>   void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
> +void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
>   void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
>   
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index 214187446e63..337c76230885 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -89,14 +89,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
>   
>   static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
>   		__vgic_v3_save_state(vcpu);
> +		__vgic_v3_deactivate_traps(vcpu);
> +	}
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
> +		__vgic_v3_activate_traps(vcpu);
>   		__vgic_v3_restore_state(vcpu);
> +	}
>   }
>   
>   static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 693d29f0036d..af7cf0faf58f 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
>   
>   void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
> +void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
>   void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
>   int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9187afca181a..901a111fb509 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -194,14 +194,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
>   
>   static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
>   		__vgic_v3_save_state(vcpu);
> +		__vgic_v3_deactivate_traps(vcpu);
> +	}
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
> +		__vgic_v3_activate_traps(vcpu);
>   		__vgic_v3_restore_state(vcpu);
> +	}
>   }
>   
>   static bool __hyp_text __true_value(void)
> diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
> index 811b42c8441d..e5f3bc7582b6 100644
> --- a/virt/kvm/arm/hyp/vgic-v3-sr.c
> +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
> @@ -208,15 +208,15 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   {
>   	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>   	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
> -	u64 val;
>   
>   	/*
>   	 * Make sure stores to the GIC via the memory mapped interface
> -	 * are now visible to the system register interface.
> +	 * are now visible to the system register interface when reading the
> +	 * LRs, and when reading back the VMCR on non-VHE systems.
>   	 */
> -	if (!cpu_if->vgic_sre) {
> -		dsb(st);
> -		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
> +	if (used_lrs || !has_vhe()) {
> +		if (!cpu_if->vgic_sre)
> +			dsb(st);
>   	}
>   
>   	if (used_lrs) {
> @@ -225,7 +225,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   
>   		elrsr = read_gicreg(ICH_ELSR_EL2);
>   
> -		write_gicreg(0, ICH_HCR_EL2);
> +		write_gicreg(cpu_if->vgic_hcr & ~ICH_HCR_EN, ICH_HCR_EL2);
>   
>   		for (i = 0; i < used_lrs; i++) {
>   			if (elrsr & (1 << i))
> @@ -235,19 +235,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   
>   			__gic_v3_set_lr(0, i);
>   		}
> -	} else {
> -		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> -		    cpu_if->its_vpe.its_vm)
> -			write_gicreg(0, ICH_HCR_EL2);
> -	}
> -
> -	val = read_gicreg(ICC_SRE_EL2);
> -	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
> -
> -	if (!cpu_if->vgic_sre) {
> -		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
> -		isb();
> -		write_gicreg(1, ICC_SRE_EL1);
>   	}
>   }
>   
> @@ -257,6 +244,31 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>   	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
>   	int i;
>   
> +	if (used_lrs) {
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +
> +		for (i = 0; i < used_lrs; i++)
> +			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
> +	}
> +
> +	/*
> +	 * Ensure that writes to the LRs, and on non-VHE systems ensure that
> +	 * the write to the VMCR in __vgic_v3_activate_traps(), will have
> +	 * reached the (re)distributors. This ensure the guest will read the
> +	 * correct values from the memory-mapped interface.
> +	 */
> +	if (used_lrs || !has_vhe()) {
> +		if (!cpu_if->vgic_sre) {
> +			isb();
> +			dsb(sy);
> +		}
> +	}
> +}
> +
> +void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> +
>   	/*
>   	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
>   	 * Group0 interrupt (as generated in GICv2 mode) to be
> @@ -264,47 +276,70 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>   	 * consequences. So we must make sure that ICC_SRE_EL1 has
>   	 * been actually programmed with the value we want before
>   	 * starting to mess with the rest of the GIC, and VMCR_EL2 in
> -	 * particular.
> +	 * particular.  This logic must be called before
> +	 * __vgic_v3_restore_state().
>   	 */
>   	if (!cpu_if->vgic_sre) {
>   		write_gicreg(0, ICC_SRE_EL1);
>   		isb();
>   		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
> -	}
>   
> -	if (used_lrs) {
> -		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
>   
> -		for (i = 0; i < used_lrs; i++)
> -			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
> -	} else {
> -		/*
> -		 * If we need to trap system registers, we must write
> -		 * ICH_HCR_EL2 anyway, even if no interrupts are being
> -		 * injected. Same thing if GICv4 is used, as VLPI
> -		 * delivery is gated by ICH_HCR_EL2.En.
> -		 */
> -		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> -		    cpu_if->its_vpe.its_vm)
> -			write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +		if (has_vhe()) {
> +			/*
> +			 * Ensure that the write to the VMCR will have reached
> +			 * the (re)distributors. This ensure the guest will
> +			 * read the correct values from the memory-mapped
> +			 * interface.
> +			 */
> +			isb();
> +			dsb(sy);
> +		}
>   	}
>   
>   	/*
> -	 * Ensures that the above will have reached the
> -	 * (re)distributors. This ensure the guest will read the
> -	 * correct values from the memory-mapped interface.
> +	 * Prevent the guest from touching the GIC system registers if
> +	 * SRE isn't enabled for GICv3 emulation.
> +	 */
> +	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
> +		     ICC_SRE_EL2);
> +
> +	/*
> +	 * If we need to trap system registers, we must write
> +	 * ICH_HCR_EL2 anyway, even if no interrupts are being
> +	 * injected,
>   	 */
> +	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> +	    cpu_if->its_vpe.its_vm)
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +}
> +
> +void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> +	u64 val;
> +
>   	if (!cpu_if->vgic_sre) {
> +		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
> +	}
> +
> +

Nit: Please remove one space.

> +	val = read_gicreg(ICC_SRE_EL2);
> +	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
> +
> +	if (!cpu_if->vgic_sre) {
> +		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
>   		isb();
> -		dsb(sy);
> +		write_gicreg(1, ICC_SRE_EL1);
>   	}
>   
>   	/*
> -	 * Prevent the guest from touching the GIC system registers if
> -	 * SRE isn't enabled for GICv3 emulation.
> +	 * If we were trapping system registers, we enabled the VGIC even if
> +	 * no interrupts were being injected, and we disable it again here.
>   	 */
> -	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
> -		     ICC_SRE_EL2);
> +	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> +	    cpu_if->its_vpe.its_vm)
> +		write_gicreg(0, ICH_HCR_EL2);
>   }
>   
>   void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 4bafcd1e6bb8..4200657694f0 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -590,6 +590,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
>   		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
>   
>   	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
> +
> +	if (has_vhe())
> +		__vgic_v3_activate_traps(vcpu);
>   }
>   
>   void vgic_v3_put(struct kvm_vcpu *vcpu)
> @@ -600,4 +603,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
>   		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
>   
>   	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
> +
> +	if (has_vhe())
> +		__vgic_v3_deactivate_traps(vcpu);
>   }
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index d0a19a8c196a..0d95d7b55567 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -763,14 +763,14 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>   {
>   	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>   
> -	vgic_save_state(vcpu);
> -
>   	WARN_ON(vgic_v4_sync_hwstate(vcpu));
>   
>   	/* An empty ap_list_head implies used_lrs == 0 */
>   	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>   		return;
>   
> +	vgic_save_state(vcpu);
> +
>   	if (vgic_cpu->used_lrs)
>   		vgic_fold_lr_state(vcpu);
>   	vgic_prune_ap_list(vcpu);
> @@ -799,7 +799,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   	 * this.
>   	 */
>   	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
> -		goto out;
> +		return;
>   
>   	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>   
> @@ -807,7 +807,6 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   	vgic_flush_lr_state(vcpu);
>   	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>   
> -out:
>   	vgic_restore_state(vcpu);
>   }
>   
> 

The idea makes a lot of sense to me.

However, function __vgic_v3_save_state() depends on 
__vgic_v3_deactivate_traps() indirectly while doing dsb(st) for nvhe and 
!cpu_if->vgic_sre case. That's fine since we have good explanation in 
comment but we have to be careful for any potential future changes. The 
only alternative I see now is having separate function set for nvhe and 
vhe cases but this means massive code duplication... So I think we need 
to live with that, performance is top priority.

For the whole series:
Reviewed-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
@ 2018-02-05 13:29     ` Tomasz Nowicki
  0 siblings, 0 replies; 223+ messages in thread
From: Tomasz Nowicki @ 2018-02-05 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12.01.2018 13:07, Christoffer Dall wrote:
> We can finally get completely rid of any calls to the VGICv3
> save/restore functions when the AP lists are empty on VHE systems.  This
> requires carefully factoring out trap configuration from saving and
> restoring state, and carefully choosing what to do on the VHE and
> non-VHE path.
> 
> One of the challenges is that we cannot save/restore the VMCR lazily
> because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
> emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
> up being delivered as FIQ.
> 
> To solve this problem, and still provide fast performance in the fast
> path of exiting a VM when no interrupts are pending (which also
> optimized the latency for actually delivering virtual interrupts coming
> from physical interrupts), we orchestrate a dance of only doing the
> activate/deactivate traps in vgic load/put for VHE systems (which can
> have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
> configuration on every round-trip on non-VHE systems.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm/include/asm/kvm_hyp.h   |   2 +
>   arch/arm/kvm/hyp/switch.c        |   8 ++-
>   arch/arm64/include/asm/kvm_hyp.h |   2 +
>   arch/arm64/kvm/hyp/switch.c      |   8 ++-
>   virt/kvm/arm/hyp/vgic-v3-sr.c    | 121 +++++++++++++++++++++++++--------------
>   virt/kvm/arm/vgic/vgic-v3.c      |   6 ++
>   virt/kvm/arm/vgic/vgic.c         |   7 +--
>   7 files changed, 103 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index b3dd4f4304f5..d01676e5b816 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -109,6 +109,8 @@ void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>   
>   void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
> +void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
>   void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
>   
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index 214187446e63..337c76230885 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -89,14 +89,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
>   
>   static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
>   		__vgic_v3_save_state(vcpu);
> +		__vgic_v3_deactivate_traps(vcpu);
> +	}
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
> +		__vgic_v3_activate_traps(vcpu);
>   		__vgic_v3_restore_state(vcpu);
> +	}
>   }
>   
>   static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 693d29f0036d..af7cf0faf58f 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -125,6 +125,8 @@ int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
>   
>   void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
> +void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
>   void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
>   void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
>   int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9187afca181a..901a111fb509 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -194,14 +194,18 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
>   
>   static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
>   		__vgic_v3_save_state(vcpu);
> +		__vgic_v3_deactivate_traps(vcpu);
> +	}
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   {
> -	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
> +	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
> +		__vgic_v3_activate_traps(vcpu);
>   		__vgic_v3_restore_state(vcpu);
> +	}
>   }
>   
>   static bool __hyp_text __true_value(void)
> diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
> index 811b42c8441d..e5f3bc7582b6 100644
> --- a/virt/kvm/arm/hyp/vgic-v3-sr.c
> +++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
> @@ -208,15 +208,15 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   {
>   	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>   	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
> -	u64 val;
>   
>   	/*
>   	 * Make sure stores to the GIC via the memory mapped interface
> -	 * are now visible to the system register interface.
> +	 * are now visible to the system register interface when reading the
> +	 * LRs, and when reading back the VMCR on non-VHE systems.
>   	 */
> -	if (!cpu_if->vgic_sre) {
> -		dsb(st);
> -		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
> +	if (used_lrs || !has_vhe()) {
> +		if (!cpu_if->vgic_sre)
> +			dsb(st);
>   	}
>   
>   	if (used_lrs) {
> @@ -225,7 +225,7 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   
>   		elrsr = read_gicreg(ICH_ELSR_EL2);
>   
> -		write_gicreg(0, ICH_HCR_EL2);
> +		write_gicreg(cpu_if->vgic_hcr & ~ICH_HCR_EN, ICH_HCR_EL2);
>   
>   		for (i = 0; i < used_lrs; i++) {
>   			if (elrsr & (1 << i))
> @@ -235,19 +235,6 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>   
>   			__gic_v3_set_lr(0, i);
>   		}
> -	} else {
> -		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> -		    cpu_if->its_vpe.its_vm)
> -			write_gicreg(0, ICH_HCR_EL2);
> -	}
> -
> -	val = read_gicreg(ICC_SRE_EL2);
> -	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
> -
> -	if (!cpu_if->vgic_sre) {
> -		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
> -		isb();
> -		write_gicreg(1, ICC_SRE_EL1);
>   	}
>   }
>   
> @@ -257,6 +244,31 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>   	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
>   	int i;
>   
> +	if (used_lrs) {
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +
> +		for (i = 0; i < used_lrs; i++)
> +			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
> +	}
> +
> +	/*
> +	 * Ensure that writes to the LRs, and on non-VHE systems ensure that
> +	 * the write to the VMCR in __vgic_v3_activate_traps(), will have
> +	 * reached the (re)distributors. This ensure the guest will read the
> +	 * correct values from the memory-mapped interface.
> +	 */
> +	if (used_lrs || !has_vhe()) {
> +		if (!cpu_if->vgic_sre) {
> +			isb();
> +			dsb(sy);
> +		}
> +	}
> +}
> +
> +void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> +
>   	/*
>   	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
>   	 * Group0 interrupt (as generated in GICv2 mode) to be
> @@ -264,47 +276,70 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
>   	 * consequences. So we must make sure that ICC_SRE_EL1 has
>   	 * been actually programmed with the value we want before
>   	 * starting to mess with the rest of the GIC, and VMCR_EL2 in
> -	 * particular.
> +	 * particular.  This logic must be called before
> +	 * __vgic_v3_restore_state().
>   	 */
>   	if (!cpu_if->vgic_sre) {
>   		write_gicreg(0, ICC_SRE_EL1);
>   		isb();
>   		write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2);
> -	}
>   
> -	if (used_lrs) {
> -		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
>   
> -		for (i = 0; i < used_lrs; i++)
> -			__gic_v3_set_lr(cpu_if->vgic_lr[i], i);
> -	} else {
> -		/*
> -		 * If we need to trap system registers, we must write
> -		 * ICH_HCR_EL2 anyway, even if no interrupts are being
> -		 * injected. Same thing if GICv4 is used, as VLPI
> -		 * delivery is gated by ICH_HCR_EL2.En.
> -		 */
> -		if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> -		    cpu_if->its_vpe.its_vm)
> -			write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +		if (has_vhe()) {
> +			/*
> +			 * Ensure that the write to the VMCR will have reached
> +			 * the (re)distributors. This ensure the guest will
> +			 * read the correct values from the memory-mapped
> +			 * interface.
> +			 */
> +			isb();
> +			dsb(sy);
> +		}
>   	}
>   
>   	/*
> -	 * Ensures that the above will have reached the
> -	 * (re)distributors. This ensure the guest will read the
> -	 * correct values from the memory-mapped interface.
> +	 * Prevent the guest from touching the GIC system registers if
> +	 * SRE isn't enabled for GICv3 emulation.
> +	 */
> +	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
> +		     ICC_SRE_EL2);
> +
> +	/*
> +	 * If we need to trap system registers, we must write
> +	 * ICH_HCR_EL2 anyway, even if no interrupts are being
> +	 * injected,
>   	 */
> +	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> +	    cpu_if->its_vpe.its_vm)
> +		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
> +}
> +
> +void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
> +	u64 val;
> +
>   	if (!cpu_if->vgic_sre) {
> +		cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2);
> +	}
> +
> +

Nit: Please remove one space.

> +	val = read_gicreg(ICC_SRE_EL2);
> +	write_gicreg(val | ICC_SRE_EL2_ENABLE, ICC_SRE_EL2);
> +
> +	if (!cpu_if->vgic_sre) {
> +		/* Make sure ENABLE is set at EL2 before setting SRE at EL1 */
>   		isb();
> -		dsb(sy);
> +		write_gicreg(1, ICC_SRE_EL1);
>   	}
>   
>   	/*
> -	 * Prevent the guest from touching the GIC system registers if
> -	 * SRE isn't enabled for GICv3 emulation.
> +	 * If we were trapping system registers, we enabled the VGIC even if
> +	 * no interrupts were being injected, and we disable it again here.
>   	 */
> -	write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
> -		     ICC_SRE_EL2);
> +	if (static_branch_unlikely(&vgic_v3_cpuif_trap) ||
> +	    cpu_if->its_vpe.its_vm)
> +		write_gicreg(0, ICH_HCR_EL2);
>   }
>   
>   void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 4bafcd1e6bb8..4200657694f0 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -590,6 +590,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
>   		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
>   
>   	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
> +
> +	if (has_vhe())
> +		__vgic_v3_activate_traps(vcpu);
>   }
>   
>   void vgic_v3_put(struct kvm_vcpu *vcpu)
> @@ -600,4 +603,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
>   		cpu_if->vgic_vmcr = kvm_call_hyp(__vgic_v3_read_vmcr);
>   
>   	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
> +
> +	if (has_vhe())
> +		__vgic_v3_deactivate_traps(vcpu);
>   }
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index d0a19a8c196a..0d95d7b55567 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -763,14 +763,14 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>   {
>   	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>   
> -	vgic_save_state(vcpu);
> -
>   	WARN_ON(vgic_v4_sync_hwstate(vcpu));
>   
>   	/* An empty ap_list_head implies used_lrs == 0 */
>   	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>   		return;
>   
> +	vgic_save_state(vcpu);
> +
>   	if (vgic_cpu->used_lrs)
>   		vgic_fold_lr_state(vcpu);
>   	vgic_prune_ap_list(vcpu);
> @@ -799,7 +799,7 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   	 * this.
>   	 */
>   	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
> -		goto out;
> +		return;
>   
>   	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>   
> @@ -807,7 +807,6 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>   	vgic_flush_lr_state(vcpu);
>   	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>   
> -out:
>   	vgic_restore_state(vcpu);
>   }
>   
> 

The idea makes a lot of sense to me.

However, function __vgic_v3_save_state() depends on 
__vgic_v3_deactivate_traps() indirectly while doing dsb(st) for nvhe and 
!cpu_if->vgic_sre case. That's fine since we have good explanation in 
comment but we have to be careful for any potential future changes. The 
only alternative I see now is having separate function set for nvhe and 
vhe cases but this means massive code duplication... So I think we need 
to live with that, performance is top priority.

For the whole series:
Reviewed-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-05 14:34     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 14:34 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm, Shih-Wei Li

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
> we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
> there is also no need to do vcpu_load() for things such as handling the
> immediate_exit flag.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> ---
>   virt/kvm/arm/arch_timer.c     |  7 -------
>   virt/kvm/arm/arm.c            | 22 ++++++++--------------
>   virt/kvm/arm/vgic/vgic-init.c | 11 -----------
>   3 files changed, 8 insertions(+), 32 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index cfcd0323deab..c09c701fd68e 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -834,14 +834,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>   		return ret;
>   
>   no_vgic:
> -	preempt_disable();
>   	timer->enabled = 1;
> -	if (!irqchip_in_kernel(vcpu->kvm))
> -		kvm_timer_vcpu_load_user(vcpu);
> -	else
> -		kvm_timer_vcpu_load_vgic(vcpu);
> -	preempt_enable();
> -
>   	return 0;
>   }
>   
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5e3c149a6e28..360df72692ee 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -631,27 +631,22 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>   		return -ENOEXEC;
>   
> -	vcpu_load(vcpu);
> -
>   	ret = kvm_vcpu_first_run_init(vcpu);
>   	if (ret)
> -		goto out;
> +		return ret;
>   
>   	if (run->exit_reason == KVM_EXIT_MMIO) {
>   		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>   		if (ret)
> -			goto out;
> -		if (kvm_arm_handle_step_debug(vcpu, vcpu->run)) {
> -			ret = 0;
> -			goto out;
> -		}
> -
> +			return ret;
> +		if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
> +			return 0;
>   	}
>   
> -	if (run->immediate_exit) {
> -		ret = -EINTR;
> -		goto out;
> -	}
> +	if (run->immediate_exit)
> +		return -EINTR;
> +
> +	vcpu_load(vcpu);
>   
>   	kvm_sigset_activate(vcpu);
>   
> @@ -803,7 +798,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   
>   	kvm_sigset_deactivate(vcpu);
>   
> -out:
>   	vcpu_put(vcpu);
>   	return ret;
>   }
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 62310122ee78..a0688ef52ad7 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -300,17 +300,6 @@ int vgic_init(struct kvm *kvm)
>   
>   	dist->initialized = true;
>   
> -	/*
> -	 * If we're initializing GICv2 on-demand when first running the VCPU
> -	 * then we need to load the VGIC state onto the CPU.  We can detect
> -	 * this easily by checking if we are in between vcpu_load and vcpu_put
> -	 * when we just initialized the VGIC.
> -	 */
> -	preempt_disable();
> -	vcpu = kvm_arm_get_running_vcpu();
> -	if (vcpu)
> -		kvm_vgic_load(vcpu);
> -	preempt_enable();
>   out:
>   	return ret;
>   }
> 

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
@ 2018-02-05 14:34     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
> we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
> there is also no need to do vcpu_load() for things such as handling the
> immediate_exit flag.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

> ---
>   virt/kvm/arm/arch_timer.c     |  7 -------
>   virt/kvm/arm/arm.c            | 22 ++++++++--------------
>   virt/kvm/arm/vgic/vgic-init.c | 11 -----------
>   3 files changed, 8 insertions(+), 32 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index cfcd0323deab..c09c701fd68e 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -834,14 +834,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>   		return ret;
>   
>   no_vgic:
> -	preempt_disable();
>   	timer->enabled = 1;
> -	if (!irqchip_in_kernel(vcpu->kvm))
> -		kvm_timer_vcpu_load_user(vcpu);
> -	else
> -		kvm_timer_vcpu_load_vgic(vcpu);
> -	preempt_enable();
> -
>   	return 0;
>   }
>   
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5e3c149a6e28..360df72692ee 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -631,27 +631,22 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>   		return -ENOEXEC;
>   
> -	vcpu_load(vcpu);
> -
>   	ret = kvm_vcpu_first_run_init(vcpu);
>   	if (ret)
> -		goto out;
> +		return ret;
>   
>   	if (run->exit_reason == KVM_EXIT_MMIO) {
>   		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>   		if (ret)
> -			goto out;
> -		if (kvm_arm_handle_step_debug(vcpu, vcpu->run)) {
> -			ret = 0;
> -			goto out;
> -		}
> -
> +			return ret;
> +		if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
> +			return 0;
>   	}
>   
> -	if (run->immediate_exit) {
> -		ret = -EINTR;
> -		goto out;
> -	}
> +	if (run->immediate_exit)
> +		return -EINTR;
> +
> +	vcpu_load(vcpu);
>   
>   	kvm_sigset_activate(vcpu);
>   
> @@ -803,7 +798,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   
>   	kvm_sigset_deactivate(vcpu);
>   
> -out:
>   	vcpu_put(vcpu);
>   	return ret;
>   }
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 62310122ee78..a0688ef52ad7 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -300,17 +300,6 @@ int vgic_init(struct kvm *kvm)
>   
>   	dist->initialized = true;
>   
> -	/*
> -	 * If we're initializing GICv2 on-demand when first running the VCPU
> -	 * then we need to load the VGIC state onto the CPU.  We can detect
> -	 * this easily by checking if we are in between vcpu_load and vcpu_put
> -	 * when we just initialized the VGIC.
> -	 */
> -	preempt_disable();
> -	vcpu = kvm_arm_get_running_vcpu();
> -	if (vcpu)
> -		kvm_vgic_load(vcpu);
> -	preempt_enable();
>   out:
>   	return ret;
>   }
> 

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-05 17:14     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 17:14 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Ard Biesheuvel, Marc Zyngier, Shih-Wei Li

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 048f5db120f3..6ce0b428a4db 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -350,10 +350,15 @@ int kvm_perf_teardown(void);
>   
>   struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>   
> +extern void __kvm_set_tpidr_el2(u64 tpidr_el2);

NIT: The rest of the file seem to declare prototype without extern.

[...]

> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 71bf088f1e4b..612021dce84f 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -135,6 +135,7 @@ int main(void)
>     DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
>     DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
>     DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> +  DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
>   #endif
>   #ifdef CONFIG_CPU_PM
>     DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 9a8ab5dddd9e..a360ac6e89e9 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -62,9 +62,6 @@ ENTRY(__guest_enter)
>   	// Store the host regs
>   	save_callee_saved_regs x1
>   
> -	// Store host_ctxt and vcpu for use at exit time
> -	stp	x1, x0, [sp, #-16]!
> -
>   	add	x18, x0, #VCPU_CONTEXT
>   
>   	// Restore guest regs x0-x17
> @@ -118,8 +115,7 @@ ENTRY(__guest_exit)
>   	// Store the guest regs x19-x29, lr
>   	save_callee_saved_regs x1
>   
> -	// Restore the host_ctxt from the stack
> -	ldr	x2, [sp], #16
> +	get_host_ctxt	x2, x3
>   
>   	// Now restore the host regs
>   	restore_callee_saved_regs x2
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index e4f37b9dd47c..71b4cc92895e 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -56,18 +56,15 @@ ENDPROC(__vhe_hyp_call)
>   el1_sync:				// Guest trapped into EL2
>   	stp	x0, x1, [sp, #-16]!
>   
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x1, esr_el2
> -alternative_else
> -	mrs	x1, esr_el1
> -alternative_endif
> -	lsr	x0, x1, #ESR_ELx_EC_SHIFT
> +	mrs	x1, vttbr_el2		// If vttbr is valid, this is a trap
> +	cbnz	x1, el1_trap		// from the guest
>   
> -	cmp	x0, #ESR_ELx_EC_HVC64
> -	b.ne	el1_trap
> -
> -	mrs	x1, vttbr_el2		// If vttbr is valid, the 64bit guest
> -	cbnz	x1, el1_trap		// called HVC
> +#ifdef CONFIG_DEBUG
> +	mrs	x0, esr_el2
> +	lsr	x0, x0, #ESR_ELx_EC_SHIFT
> +	cmp     x0, #ESR_ELx_EC_HVC64
> +	b.ne    __hyp_panic
> +#endif

FWIW, I noticed that Mark's series about Spectre is also touching this 
code (see https://patchwork.kernel.org/patch/10190297/).

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack
@ 2018-02-05 17:14     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 048f5db120f3..6ce0b428a4db 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -350,10 +350,15 @@ int kvm_perf_teardown(void);
>   
>   struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>   
> +extern void __kvm_set_tpidr_el2(u64 tpidr_el2);

NIT: The rest of the file seem to declare prototype without extern.

[...]

> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 71bf088f1e4b..612021dce84f 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -135,6 +135,7 @@ int main(void)
>     DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
>     DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
>     DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> +  DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
>   #endif
>   #ifdef CONFIG_CPU_PM
>     DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 9a8ab5dddd9e..a360ac6e89e9 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -62,9 +62,6 @@ ENTRY(__guest_enter)
>   	// Store the host regs
>   	save_callee_saved_regs x1
>   
> -	// Store host_ctxt and vcpu for use at exit time
> -	stp	x1, x0, [sp, #-16]!
> -
>   	add	x18, x0, #VCPU_CONTEXT
>   
>   	// Restore guest regs x0-x17
> @@ -118,8 +115,7 @@ ENTRY(__guest_exit)
>   	// Store the guest regs x19-x29, lr
>   	save_callee_saved_regs x1
>   
> -	// Restore the host_ctxt from the stack
> -	ldr	x2, [sp], #16
> +	get_host_ctxt	x2, x3
>   
>   	// Now restore the host regs
>   	restore_callee_saved_regs x2
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index e4f37b9dd47c..71b4cc92895e 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -56,18 +56,15 @@ ENDPROC(__vhe_hyp_call)
>   el1_sync:				// Guest trapped into EL2
>   	stp	x0, x1, [sp, #-16]!
>   
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x1, esr_el2
> -alternative_else
> -	mrs	x1, esr_el1
> -alternative_endif
> -	lsr	x0, x1, #ESR_ELx_EC_SHIFT
> +	mrs	x1, vttbr_el2		// If vttbr is valid, this is a trap
> +	cbnz	x1, el1_trap		// from the guest
>   
> -	cmp	x0, #ESR_ELx_EC_HVC64
> -	b.ne	el1_trap
> -
> -	mrs	x1, vttbr_el2		// If vttbr is valid, the 64bit guest
> -	cbnz	x1, el1_trap		// called HVC
> +#ifdef CONFIG_DEBUG
> +	mrs	x0, esr_el2
> +	lsr	x0, x0, #ESR_ELx_EC_SHIFT
> +	cmp     x0, #ESR_ELx_EC_HVC64
> +	b.ne    __hyp_panic
> +#endif

FWIW, I noticed that Mark's series about Spectre is also touching this 
code (see https://patchwork.kernel.org/patch/10190297/).

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-05 18:04     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 18:04 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> VHE actually doesn't rely on clearing the VTTBR when returning to the
> host kernel, and that is the current key mechanism of hyp_panic to
> figure out how to attempt to return to a state good enough to print a
> panic statement.
> 
> Therefore, we split the hyp_panic function into two functions, a VHE and
> a non-VHE, keeping the non-VHE version intact, but changing the VHE
> behavior.
> 
> The vttbr_el2 check on VHE doesn't really make that much sense, because
> the only situation where we can get here on VHE is when the hypervisor
> assembly code actually called into hyp_panic, which only happens when
> VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
> always safely disable the traps and restore the host registers at this
> point, so we simply do that unconditionally and call into the panic
> function directly.
> 
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
>   1 file changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 6fcb37e220b5..71700ecee308 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>   
>   static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> -					     struct kvm_vcpu *vcpu)
> +					     struct kvm_cpu_context *__host_ctxt)
>   {
> +	struct kvm_vcpu *vcpu;
>   	unsigned long str_va;
>   
> +	vcpu = __host_ctxt->__hyp_running_vcpu;
> +
> +	if (read_sysreg(vttbr_el2)) {
> +		__timer_disable_traps(vcpu);
> +		__deactivate_traps(vcpu);
> +		__deactivate_vm(vcpu);
> +		__sysreg_restore_host_state(__host_ctxt);
> +	}
> +
>   	/*
>   	 * Force the panic string to be loaded from the literal pool,
>   	 * making sure it is a kernel address and not a PC-relative
> @@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
>   		       read_sysreg(hpfar_el2), par, vcpu);
>   }
>   
> -static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> -					    struct kvm_vcpu *vcpu)
> +static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> +				 struct kvm_cpu_context *host_ctxt)
>   {
> +	struct kvm_vcpu *vcpu;
> +	vcpu = host_ctxt->__hyp_running_vcpu;
> +
> +	__deactivate_traps(vcpu);
> +	__sysreg_restore_host_state(host_ctxt);

I was about to ask why you keep this function around as it does nothing 
in VHE case. But I see that this will actually restore some values in a 
later patch.

> +
>   	panic(__hyp_panic_string,
>   	      spsr,  elr,
>   	      read_sysreg_el2(esr),   read_sysreg_el2(far),
>   	      read_sysreg(hpfar_el2), par, vcpu);
>   }
>   
> -static hyp_alternate_select(__hyp_call_panic,
> -			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
> -			    ARM64_HAS_VIRT_HOST_EXTN);

Out of interest, any specific rather to remove hyp_alternate_select and 
"open-code" it?

> -
>   void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
>   {
> -	struct kvm_vcpu *vcpu = NULL;
> -
>   	u64 spsr = read_sysreg_el2(spsr);
>   	u64 elr = read_sysreg_el2(elr);
>   	u64 par = read_sysreg(par_el1);
>   
> -	if (read_sysreg(vttbr_el2)) {
> -		vcpu = host_ctxt->__hyp_running_vcpu;
> -		__timer_disable_traps(vcpu);
> -		__deactivate_traps(vcpu);
> -		__deactivate_vm(vcpu);
> -		__sysreg_restore_host_state(host_ctxt);
> -	}
> -
> -	/* Call panic for real */
> -	__hyp_call_panic()(spsr, elr, par, vcpu);
> +	if (!has_vhe())
> +		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
> +	else
> +		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
>   
>   	unreachable();
>   }
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
@ 2018-02-05 18:04     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 12/01/18 12:07, Christoffer Dall wrote:
> VHE actually doesn't rely on clearing the VTTBR when returning to the
> host kernel, and that is the current key mechanism of hyp_panic to
> figure out how to attempt to return to a state good enough to print a
> panic statement.
> 
> Therefore, we split the hyp_panic function into two functions, a VHE and
> a non-VHE, keeping the non-VHE version intact, but changing the VHE
> behavior.
> 
> The vttbr_el2 check on VHE doesn't really make that much sense, because
> the only situation where we can get here on VHE is when the hypervisor
> assembly code actually called into hyp_panic, which only happens when
> VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
> always safely disable the traps and restore the host registers at this
> point, so we simply do that unconditionally and call into the panic
> function directly.
> 
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
>   1 file changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 6fcb37e220b5..71700ecee308 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>   
>   static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> -					     struct kvm_vcpu *vcpu)
> +					     struct kvm_cpu_context *__host_ctxt)
>   {
> +	struct kvm_vcpu *vcpu;
>   	unsigned long str_va;
>   
> +	vcpu = __host_ctxt->__hyp_running_vcpu;
> +
> +	if (read_sysreg(vttbr_el2)) {
> +		__timer_disable_traps(vcpu);
> +		__deactivate_traps(vcpu);
> +		__deactivate_vm(vcpu);
> +		__sysreg_restore_host_state(__host_ctxt);
> +	}
> +
>   	/*
>   	 * Force the panic string to be loaded from the literal pool,
>   	 * making sure it is a kernel address and not a PC-relative
> @@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
>   		       read_sysreg(hpfar_el2), par, vcpu);
>   }
>   
> -static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> -					    struct kvm_vcpu *vcpu)
> +static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> +				 struct kvm_cpu_context *host_ctxt)
>   {
> +	struct kvm_vcpu *vcpu;
> +	vcpu = host_ctxt->__hyp_running_vcpu;
> +
> +	__deactivate_traps(vcpu);
> +	__sysreg_restore_host_state(host_ctxt);

I was about to ask why you keep this function around as it does nothing 
in VHE case. But I see that this will actually restore some values in a 
later patch.

> +
>   	panic(__hyp_panic_string,
>   	      spsr,  elr,
>   	      read_sysreg_el2(esr),   read_sysreg_el2(far),
>   	      read_sysreg(hpfar_el2), par, vcpu);
>   }
>   
> -static hyp_alternate_select(__hyp_call_panic,
> -			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
> -			    ARM64_HAS_VIRT_HOST_EXTN);

Out of interest, any specific rather to remove hyp_alternate_select and 
"open-code" it?

> -
>   void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
>   {
> -	struct kvm_vcpu *vcpu = NULL;
> -
>   	u64 spsr = read_sysreg_el2(spsr);
>   	u64 elr = read_sysreg_el2(elr);
>   	u64 par = read_sysreg(par_el1);
>   
> -	if (read_sysreg(vttbr_el2)) {
> -		vcpu = host_ctxt->__hyp_running_vcpu;
> -		__timer_disable_traps(vcpu);
> -		__deactivate_traps(vcpu);
> -		__deactivate_vm(vcpu);
> -		__sysreg_restore_host_state(host_ctxt);
> -	}
> -
> -	/* Call panic for real */
> -	__hyp_call_panic()(spsr, elr, par, vcpu);
> +	if (!has_vhe())
> +		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
> +	else
> +		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
>   
>   	unreachable();
>   }
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
  2018-02-05 18:04     ` Julien Grall
@ 2018-02-05 18:10       ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 18:10 UTC (permalink / raw)
  To: Julien Grall, Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, kvm, Shih-Wei Li



On 05/02/18 18:04, Julien Grall wrote:
> On 12/01/18 12:07, Christoffer Dall wrote:
>> @@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 
>> spsr, u64 elr, u64 par,
>>                  read_sysreg(hpfar_el2), par, vcpu);
>>   }
>> -static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>> -                        struct kvm_vcpu *vcpu)
>> +static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>> +                 struct kvm_cpu_context *host_ctxt)
>>   {
>> +    struct kvm_vcpu *vcpu;
>> +    vcpu = host_ctxt->__hyp_running_vcpu;
>> +
>> +    __deactivate_traps(vcpu);
>> +    __sysreg_restore_host_state(host_ctxt);
> 
> I was about to ask why you keep this function around as it does nothing 
> in VHE case. But I see that this will actually restore some values in a 
> later patch.

Actually, I just misread the code. Sorry for the noise.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
@ 2018-02-05 18:10       ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-05 18:10 UTC (permalink / raw)
  To: linux-arm-kernel



On 05/02/18 18:04, Julien Grall wrote:
> On 12/01/18 12:07, Christoffer Dall wrote:
>> @@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 
>> spsr, u64 elr, u64 par,
>> ???????????????? read_sysreg(hpfar_el2), par, vcpu);
>> ? }
>> -static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>> -??????????????????????? struct kvm_vcpu *vcpu)
>> +static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>> +???????????????? struct kvm_cpu_context *host_ctxt)
>> ? {
>> +??? struct kvm_vcpu *vcpu;
>> +??? vcpu = host_ctxt->__hyp_running_vcpu;
>> +
>> +??? __deactivate_traps(vcpu);
>> +??? __sysreg_restore_host_state(host_ctxt);
> 
> I was about to ask why you keep this function around as it does nothing 
> in VHE case. But I see that this will actually restore some values in a 
> later patch.

Actually, I just misread the code. Sorry for the noise.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-01-25 19:46       ` Christoffer Dall
@ 2018-02-07 16:49         ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-07 16:49 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > Avoid saving the guest VFP registers and restoring the host VFP
> > > registers on every exit from the VM.  Only when we're about to run
> > > userspace or other threads in the kernel do we really have to switch the
> > > state back to the host state.
> > > 
> > > We still initially configure the VFP registers to trap when entering the
> > > VM, but the difference is that we now leave the guest state in the
> > > hardware registers as long as we're running this VCPU, even if we
> > > occasionally trap to the host, and we only restore the host state when
> > > we return to user space or when scheduling another thread.
> > > 
> > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > index 883a6383cd36..848a46eb33bf 100644
> > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > 
> > [...]
> > 
> > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > >   */
> > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > >  {
> > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > +
> > > +	/* Restore host FP/SIMD state */
> > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > +				     kern_hyp_va(guest_ctxt));
> > > +		}
> > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > +		vcpu->arch.guest_vfp_loaded = 0;
> > 
> > Provided we've already marked the host FPSIMD state as dirty on the way
> > in, we probably don't need to restore it here.
> > 
> > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > it's only done for SVE, since KVM was previously restoring the host
> > FPSIMD subset of the state anyway, but it could be made unconditional.
> > 
> > For a returning run ioctl, this would have the effect of deferring the
> > host FPSIMD reload until we return to userspace, which is probably
> > no more costly since the kernel must check whether to do this in
> > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > other thread we save the cost of restoring the host state entirely here
> > ... I think.
> 
> Yes, I agree.  However, currently the low-level logic in
> arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> host_cpu_context is a KVM-specific per-cpu variable).  I think means
> that simply marking the state as invalid would cause the kernel to
> restore some potentially stale values when returning to userspace.  Am I
> missing something?

I think my point was that there would be no need for the low-level
save of the host fpsimd state currently done by hyp.  At all.  The
state would already have been saved off to thread_struct before
entering the guest.

This would result in a redundant save, but only when the host fpsimd
state is dirty and the guest vcpu doesn't touch fpsimd before trapping
back to the host.

For the host, the fpsimd state is only dirty after entering the kernel
from userspace (or after certain other things like sigreturn or ptrace).
So this approach would still avoid repeated save/restore when cycling
between the guest and the kvm code in the host.

> It might very well be possible to change the logic so that we store the
> host logic the same place where task_fpsimd_save() would have, and I
> think that would make what you suggest possible.

That's certainly possible, but I viewed that as harder.  It would be
necessary to map the host thread_struct into hyp etc. etc.

> I'd like to make that a separate change from this patch though, as we're
> already changing quite a bit with this series, so I'm trying to make any
> logical change as contained per patch as possible, so that problems can
> be spotted by bisecting.

Yes, I think that's wise.

> > Ultimately I'd like to go one better and actually treat a vcpu as a
> > first-class fpsimd context, so that taking an interrupt to the host
> > and then reentering the guest doesn't cause any reload at all.  
> 
> That should be the case already; kvm_vcpu_put_sysregs() is only called
> when you run another thread (preemptively or voluntarily), or when you
> return to user space, but making the vcpu fpsimd context a first-class
> citizen fpsimd context would mean that you can run another thread (and
> maybe run userspace if it doesn't use fpsimd?) without having to
> save/restore anything.  Am I getting this right?

Yes (except that if a return to userspace happens then FPSIMD will be
restored at that point: there is no laziness there -- it _could_
be lazy, but it's deemed unlikely to be a performance win due to the
fact that the compiler can and does generate FPSIMD code quite
liberally by default).

For the case of being preempted within the kernel with no ret_to_user,
you are correct.

> 
> > But
> > that feels like too big a step for this series, and there are likely
> > side-issues I've not thought about yet.
> > 
> 
> It should definitely be in separate patches, but I would be optn to
> tagging something on to the end of this series if we can stabilize this
> series early after -rc1 is out.

I haven't fully got my head around it, but we can see where we get to.
Best not to rush into it if there's any doubt...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-07 16:49         ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-07 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > Avoid saving the guest VFP registers and restoring the host VFP
> > > registers on every exit from the VM.  Only when we're about to run
> > > userspace or other threads in the kernel do we really have to switch the
> > > state back to the host state.
> > > 
> > > We still initially configure the VFP registers to trap when entering the
> > > VM, but the difference is that we now leave the guest state in the
> > > hardware registers as long as we're running this VCPU, even if we
> > > occasionally trap to the host, and we only restore the host state when
> > > we return to user space or when scheduling another thread.
> > > 
> > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > 
> > [...]
> > 
> > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > index 883a6383cd36..848a46eb33bf 100644
> > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > 
> > [...]
> > 
> > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > >   */
> > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > >  {
> > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > +
> > > +	/* Restore host FP/SIMD state */
> > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > +				     kern_hyp_va(guest_ctxt));
> > > +		}
> > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > +		vcpu->arch.guest_vfp_loaded = 0;
> > 
> > Provided we've already marked the host FPSIMD state as dirty on the way
> > in, we probably don't need to restore it here.
> > 
> > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > it's only done for SVE, since KVM was previously restoring the host
> > FPSIMD subset of the state anyway, but it could be made unconditional.
> > 
> > For a returning run ioctl, this would have the effect of deferring the
> > host FPSIMD reload until we return to userspace, which is probably
> > no more costly since the kernel must check whether to do this in
> > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > other thread we save the cost of restoring the host state entirely here
> > ... I think.
> 
> Yes, I agree.  However, currently the low-level logic in
> arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> host_cpu_context is a KVM-specific per-cpu variable).  I think means
> that simply marking the state as invalid would cause the kernel to
> restore some potentially stale values when returning to userspace.  Am I
> missing something?

I think my point was that there would be no need for the low-level
save of the host fpsimd state currently done by hyp.  At all.  The
state would already have been saved off to thread_struct before
entering the guest.

This would result in a redundant save, but only when the host fpsimd
state is dirty and the guest vcpu doesn't touch fpsimd before trapping
back to the host.

For the host, the fpsimd state is only dirty after entering the kernel
from userspace (or after certain other things like sigreturn or ptrace).
So this approach would still avoid repeated save/restore when cycling
between the guest and the kvm code in the host.

> It might very well be possible to change the logic so that we store the
> host logic the same place where task_fpsimd_save() would have, and I
> think that would make what you suggest possible.

That's certainly possible, but I viewed that as harder.  It would be
necessary to map the host thread_struct into hyp etc. etc.

> I'd like to make that a separate change from this patch though, as we're
> already changing quite a bit with this series, so I'm trying to make any
> logical change as contained per patch as possible, so that problems can
> be spotted by bisecting.

Yes, I think that's wise.

> > Ultimately I'd like to go one better and actually treat a vcpu as a
> > first-class fpsimd context, so that taking an interrupt to the host
> > and then reentering the guest doesn't cause any reload at all.  
> 
> That should be the case already; kvm_vcpu_put_sysregs() is only called
> when you run another thread (preemptively or voluntarily), or when you
> return to user space, but making the vcpu fpsimd context a first-class
> citizen fpsimd context would mean that you can run another thread (and
> maybe run userspace if it doesn't use fpsimd?) without having to
> save/restore anything.  Am I getting this right?

Yes (except that if a return to userspace happens then FPSIMD will be
restored at that point: there is no laziness there -- it _could_
be lazy, but it's deemed unlikely to be a performance win due to the
fact that the compiler can and does generate FPSIMD code quite
liberally by default).

For the case of being preempted within the kernel with no ret_to_user,
you are correct.

> 
> > But
> > that feels like too big a step for this series, and there are likely
> > side-issues I've not thought about yet.
> > 
> 
> It should definitely be in separate patches, but I would be optn to
> tagging something on to the end of this series if we can stabilize this
> series early after -rc1 is out.

I haven't fully got my head around it, but we can see where we get to.
Best not to rush into it if there's any doubt...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-07 16:49         ` Dave Martin
@ 2018-02-07 17:56           ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-07 17:56 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > registers on every exit from the VM.  Only when we're about to run
> > > > userspace or other threads in the kernel do we really have to switch the
> > > > state back to the host state.
> > > > 
> > > > We still initially configure the VFP registers to trap when entering the
> > > > VM, but the difference is that we now leave the guest state in the
> > > > hardware registers as long as we're running this VCPU, even if we
> > > > occasionally trap to the host, and we only restore the host state when
> > > > we return to user space or when scheduling another thread.
> > > > 
> > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > 
> > > [...]
> > > 
> > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > index 883a6383cd36..848a46eb33bf 100644
> > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > 
> > > [...]
> > > 
> > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > >   */
> > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > >  {
> > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > +
> > > > +	/* Restore host FP/SIMD state */
> > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > +				     kern_hyp_va(guest_ctxt));
> > > > +		}
> > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > 
> > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > in, we probably don't need to restore it here.
> > > 
> > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > it's only done for SVE, since KVM was previously restoring the host
> > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > 
> > > For a returning run ioctl, this would have the effect of deferring the
> > > host FPSIMD reload until we return to userspace, which is probably
> > > no more costly since the kernel must check whether to do this in
> > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > other thread we save the cost of restoring the host state entirely here
> > > ... I think.
> > 
> > Yes, I agree.  However, currently the low-level logic in
> > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > that simply marking the state as invalid would cause the kernel to
> > restore some potentially stale values when returning to userspace.  Am I
> > missing something?
> 
> I think my point was that there would be no need for the low-level
> save of the host fpsimd state currently done by hyp.  At all.  The
> state would already have been saved off to thread_struct before
> entering the guest.

Ah, so if userspace touched any FPSIMD state, then we always save that
state when entering the kernel, even if we're just going to return to
the same userspace process anyway?  (For any system call etc.?)

> 
> This would result in a redundant save, but only when the host fpsimd
> state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> back to the host.
> 
> For the host, the fpsimd state is only dirty after entering the kernel
> from userspace (or after certain other things like sigreturn or ptrace).
> So this approach would still avoid repeated save/restore when cycling
> between the guest and the kvm code in the host.
> 

I see.

> > It might very well be possible to change the logic so that we store the
> > host logic the same place where task_fpsimd_save() would have, and I
> > think that would make what you suggest possible.
> 
> That's certainly possible, but I viewed that as harder.  It would be
> necessary to map the host thread_struct into hyp etc. etc.
> 

And even then, unnecessary because it would duplicate the existing state
save, IIUC above.

> > I'd like to make that a separate change from this patch though, as we're
> > already changing quite a bit with this series, so I'm trying to make any
> > logical change as contained per patch as possible, so that problems can
> > be spotted by bisecting.
> 
> Yes, I think that's wise.
> 

ok, I'll try to incorporate this as a separate patch for the next
revision.

> > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > first-class fpsimd context, so that taking an interrupt to the host
> > > and then reentering the guest doesn't cause any reload at all.  
> > 
> > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > when you run another thread (preemptively or voluntarily), or when you
> > return to user space, but making the vcpu fpsimd context a first-class
> > citizen fpsimd context would mean that you can run another thread (and
> > maybe run userspace if it doesn't use fpsimd?) without having to
> > save/restore anything.  Am I getting this right?
> 
> Yes (except that if a return to userspace happens then FPSIMD will be
> restored at that point: there is no laziness there -- it _could_
> be lazy, but it's deemed unlikely to be a performance win due to the
> fact that the compiler can and does generate FPSIMD code quite
> liberally by default).
> 
> For the case of being preempted within the kernel with no ret_to_user,
> you are correct.
> 

ok, that would indeed also be useful for things like switching to a
vhost thread and returning to the vcpu thread.

> > 
> > > But
> > > that feels like too big a step for this series, and there are likely
> > > side-issues I've not thought about yet.
> > > 
> > 
> > It should definitely be in separate patches, but I would be optn to
> > tagging something on to the end of this series if we can stabilize this
> > series early after -rc1 is out.
> 
> I haven't fully got my head around it, but we can see where we get to.
> Best not to rush into it if there's any doubt...
> 
Agreed, we can always add things later.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-07 17:56           ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-07 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > registers on every exit from the VM.  Only when we're about to run
> > > > userspace or other threads in the kernel do we really have to switch the
> > > > state back to the host state.
> > > > 
> > > > We still initially configure the VFP registers to trap when entering the
> > > > VM, but the difference is that we now leave the guest state in the
> > > > hardware registers as long as we're running this VCPU, even if we
> > > > occasionally trap to the host, and we only restore the host state when
> > > > we return to user space or when scheduling another thread.
> > > > 
> > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > 
> > > [...]
> > > 
> > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > index 883a6383cd36..848a46eb33bf 100644
> > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > 
> > > [...]
> > > 
> > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > >   */
> > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > >  {
> > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > +
> > > > +	/* Restore host FP/SIMD state */
> > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > +				     kern_hyp_va(guest_ctxt));
> > > > +		}
> > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > 
> > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > in, we probably don't need to restore it here.
> > > 
> > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > it's only done for SVE, since KVM was previously restoring the host
> > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > 
> > > For a returning run ioctl, this would have the effect of deferring the
> > > host FPSIMD reload until we return to userspace, which is probably
> > > no more costly since the kernel must check whether to do this in
> > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > other thread we save the cost of restoring the host state entirely here
> > > ... I think.
> > 
> > Yes, I agree.  However, currently the low-level logic in
> > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > that simply marking the state as invalid would cause the kernel to
> > restore some potentially stale values when returning to userspace.  Am I
> > missing something?
> 
> I think my point was that there would be no need for the low-level
> save of the host fpsimd state currently done by hyp.  At all.  The
> state would already have been saved off to thread_struct before
> entering the guest.

Ah, so if userspace touched any FPSIMD state, then we always save that
state when entering the kernel, even if we're just going to return to
the same userspace process anyway?  (For any system call etc.?)

> 
> This would result in a redundant save, but only when the host fpsimd
> state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> back to the host.
> 
> For the host, the fpsimd state is only dirty after entering the kernel
> from userspace (or after certain other things like sigreturn or ptrace).
> So this approach would still avoid repeated save/restore when cycling
> between the guest and the kvm code in the host.
> 

I see.

> > It might very well be possible to change the logic so that we store the
> > host logic the same place where task_fpsimd_save() would have, and I
> > think that would make what you suggest possible.
> 
> That's certainly possible, but I viewed that as harder.  It would be
> necessary to map the host thread_struct into hyp etc. etc.
> 

And even then, unnecessary because it would duplicate the existing state
save, IIUC above.

> > I'd like to make that a separate change from this patch though, as we're
> > already changing quite a bit with this series, so I'm trying to make any
> > logical change as contained per patch as possible, so that problems can
> > be spotted by bisecting.
> 
> Yes, I think that's wise.
> 

ok, I'll try to incorporate this as a separate patch for the next
revision.

> > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > first-class fpsimd context, so that taking an interrupt to the host
> > > and then reentering the guest doesn't cause any reload at all.  
> > 
> > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > when you run another thread (preemptively or voluntarily), or when you
> > return to user space, but making the vcpu fpsimd context a first-class
> > citizen fpsimd context would mean that you can run another thread (and
> > maybe run userspace if it doesn't use fpsimd?) without having to
> > save/restore anything.  Am I getting this right?
> 
> Yes (except that if a return to userspace happens then FPSIMD will be
> restored at that point: there is no laziness there -- it _could_
> be lazy, but it's deemed unlikely to be a performance win due to the
> fact that the compiler can and does generate FPSIMD code quite
> liberally by default).
> 
> For the case of being preempted within the kernel with no ret_to_user,
> you are correct.
> 

ok, that would indeed also be useful for things like switching to a
vhost thread and returning to the vcpu thread.

> > 
> > > But
> > > that feels like too big a step for this series, and there are likely
> > > side-issues I've not thought about yet.
> > > 
> > 
> > It should definitely be in separate patches, but I would be optn to
> > tagging something on to the end of this series if we can stabilize this
> > series early after -rc1 is out.
> 
> I haven't fully got my head around it, but we can see where we get to.
> Best not to rush into it if there's any doubt...
> 
Agreed, we can always add things later.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
  2018-02-05 18:04     ` Julien Grall
@ 2018-02-08 13:24       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 13:24 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

Hi Julien,

On Mon, Feb 05, 2018 at 06:04:25PM +0000, Julien Grall wrote:
> On 12/01/18 12:07, Christoffer Dall wrote:
> >VHE actually doesn't rely on clearing the VTTBR when returning to the
> >host kernel, and that is the current key mechanism of hyp_panic to
> >figure out how to attempt to return to a state good enough to print a
> >panic statement.
> >
> >Therefore, we split the hyp_panic function into two functions, a VHE and
> >a non-VHE, keeping the non-VHE version intact, but changing the VHE
> >behavior.
> >
> >The vttbr_el2 check on VHE doesn't really make that much sense, because
> >the only situation where we can get here on VHE is when the hypervisor
> >assembly code actually called into hyp_panic, which only happens when
> >VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
> >always safely disable the traps and restore the host registers at this
> >point, so we simply do that unconditionally and call into the panic
> >function directly.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
> >  1 file changed, 23 insertions(+), 19 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 6fcb37e220b5..71700ecee308 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
> >  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> >-					     struct kvm_vcpu *vcpu)
> >+					     struct kvm_cpu_context *__host_ctxt)
> >  {
> >+	struct kvm_vcpu *vcpu;
> >  	unsigned long str_va;
> >+	vcpu = __host_ctxt->__hyp_running_vcpu;
> >+
> >+	if (read_sysreg(vttbr_el2)) {
> >+		__timer_disable_traps(vcpu);
> >+		__deactivate_traps(vcpu);
> >+		__deactivate_vm(vcpu);
> >+		__sysreg_restore_host_state(__host_ctxt);
> >+	}
> >+
> >  	/*
> >  	 * Force the panic string to be loaded from the literal pool,
> >  	 * making sure it is a kernel address and not a PC-relative
> >@@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> >  		       read_sysreg(hpfar_el2), par, vcpu);
> >  }
> >-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> >-					    struct kvm_vcpu *vcpu)
> >+static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> >+				 struct kvm_cpu_context *host_ctxt)
> >  {
> >+	struct kvm_vcpu *vcpu;
> >+	vcpu = host_ctxt->__hyp_running_vcpu;
> >+
> >+	__deactivate_traps(vcpu);
> >+	__sysreg_restore_host_state(host_ctxt);
> 
> I was about to ask why you keep this function around as it does nothing in
> VHE case. But I see that this will actually restore some values in a later
> patch.
> 
> >+
> >  	panic(__hyp_panic_string,
> >  	      spsr,  elr,
> >  	      read_sysreg_el2(esr),   read_sysreg_el2(far),
> >  	      read_sysreg(hpfar_el2), par, vcpu);
> >  }
> >-static hyp_alternate_select(__hyp_call_panic,
> >-			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
> >-			    ARM64_HAS_VIRT_HOST_EXTN);
> 
> Out of interest, any specific rather to remove hyp_alternate_select and
> "open-code" it?
> 

Not sure I understand your question.

Are you asking why I replace the hyp alternatives with the has_vhe()?
If so, has_vhe() uses a static key and should therefore have the same
performance characteristics, but I find the has_vhe() version below much
more readable.

> >-
> >  void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
> >  {
> >-	struct kvm_vcpu *vcpu = NULL;
> >-
> >  	u64 spsr = read_sysreg_el2(spsr);
> >  	u64 elr = read_sysreg_el2(elr);
> >  	u64 par = read_sysreg(par_el1);
> >-	if (read_sysreg(vttbr_el2)) {
> >-		vcpu = host_ctxt->__hyp_running_vcpu;
> >-		__timer_disable_traps(vcpu);
> >-		__deactivate_traps(vcpu);
> >-		__deactivate_vm(vcpu);
> >-		__sysreg_restore_host_state(host_ctxt);
> >-	}
> >-
> >-	/* Call panic for real */
> >-	__hyp_call_panic()(spsr, elr, par, vcpu);
> >+	if (!has_vhe())
> >+		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
> >+	else
> >+		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
> >  	unreachable();
> >  }
> >
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
@ 2018-02-08 13:24       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Julien,

On Mon, Feb 05, 2018 at 06:04:25PM +0000, Julien Grall wrote:
> On 12/01/18 12:07, Christoffer Dall wrote:
> >VHE actually doesn't rely on clearing the VTTBR when returning to the
> >host kernel, and that is the current key mechanism of hyp_panic to
> >figure out how to attempt to return to a state good enough to print a
> >panic statement.
> >
> >Therefore, we split the hyp_panic function into two functions, a VHE and
> >a non-VHE, keeping the non-VHE version intact, but changing the VHE
> >behavior.
> >
> >The vttbr_el2 check on VHE doesn't really make that much sense, because
> >the only situation where we can get here on VHE is when the hypervisor
> >assembly code actually called into hyp_panic, which only happens when
> >VBAR_EL2 has been set to the KVM exception vectors.  On VHE, we can
> >always safely disable the traps and restore the host registers at this
> >point, so we simply do that unconditionally and call into the panic
> >function directly.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c | 42 +++++++++++++++++++++++-------------------
> >  1 file changed, 23 insertions(+), 19 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 6fcb37e220b5..71700ecee308 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -419,10 +419,20 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
> >  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> >-					     struct kvm_vcpu *vcpu)
> >+					     struct kvm_cpu_context *__host_ctxt)
> >  {
> >+	struct kvm_vcpu *vcpu;
> >  	unsigned long str_va;
> >+	vcpu = __host_ctxt->__hyp_running_vcpu;
> >+
> >+	if (read_sysreg(vttbr_el2)) {
> >+		__timer_disable_traps(vcpu);
> >+		__deactivate_traps(vcpu);
> >+		__deactivate_vm(vcpu);
> >+		__sysreg_restore_host_state(__host_ctxt);
> >+	}
> >+
> >  	/*
> >  	 * Force the panic string to be loaded from the literal pool,
> >  	 * making sure it is a kernel address and not a PC-relative
> >@@ -436,37 +446,31 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
> >  		       read_sysreg(hpfar_el2), par, vcpu);
> >  }
> >-static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> >-					    struct kvm_vcpu *vcpu)
> >+static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
> >+				 struct kvm_cpu_context *host_ctxt)
> >  {
> >+	struct kvm_vcpu *vcpu;
> >+	vcpu = host_ctxt->__hyp_running_vcpu;
> >+
> >+	__deactivate_traps(vcpu);
> >+	__sysreg_restore_host_state(host_ctxt);
> 
> I was about to ask why you keep this function around as it does nothing in
> VHE case. But I see that this will actually restore some values in a later
> patch.
> 
> >+
> >  	panic(__hyp_panic_string,
> >  	      spsr,  elr,
> >  	      read_sysreg_el2(esr),   read_sysreg_el2(far),
> >  	      read_sysreg(hpfar_el2), par, vcpu);
> >  }
> >-static hyp_alternate_select(__hyp_call_panic,
> >-			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
> >-			    ARM64_HAS_VIRT_HOST_EXTN);
> 
> Out of interest, any specific rather to remove hyp_alternate_select and
> "open-code" it?
> 

Not sure I understand your question.

Are you asking why I replace the hyp alternatives with the has_vhe()?
If so, has_vhe() uses a static key and should therefore have the same
performance characteristics, but I find the has_vhe() version below much
more readable.

> >-
> >  void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
> >  {
> >-	struct kvm_vcpu *vcpu = NULL;
> >-
> >  	u64 spsr = read_sysreg_el2(spsr);
> >  	u64 elr = read_sysreg_el2(elr);
> >  	u64 par = read_sysreg(par_el1);
> >-	if (read_sysreg(vttbr_el2)) {
> >-		vcpu = host_ctxt->__hyp_running_vcpu;
> >-		__timer_disable_traps(vcpu);
> >-		__deactivate_traps(vcpu);
> >-		__deactivate_vm(vcpu);
> >-		__sysreg_restore_host_state(host_ctxt);
> >-	}
> >-
> >-	/* Call panic for real */
> >-	__hyp_call_panic()(spsr, elr, par, vcpu);
> >+	if (!has_vhe())
> >+		__hyp_call_panic_nvhe(spsr, elr, par, host_ctxt);
> >+	else
> >+		__hyp_call_panic_vhe(spsr, elr, par, host_ctxt);
> >  	unreachable();
> >  }
> >
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
  2018-02-01 13:57   ` Tomasz Nowicki
@ 2018-02-08 15:47     ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 15:47 UTC (permalink / raw)
  To: Tomasz Nowicki; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

Hi Tomasz,

On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
> 
> I created simple module for VM kernel. It is spinning on PSCI version
> hypercall to measure the base exit cost as you suggested. Also, I measured
> CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative to
> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
> kernel v4.15-rc3:
> 
>              VHE  |  nVHE
> =========================
> baseline a)  100% |  130%
> =========================
> baseline a)  36%  |  123%
> 
> So I confirm significant performance improvement, especially for VHE case.
> Additionally,

Thanks for this.  Good to know the exit cost is still reduced.

> I run network throughput tests with vhost-net but for that
> case no differences.
> 

Throughput on vhost-net wouldn't be affected, because its protocol is
specifically designed around avoiding exits.  But if you measure latency
with TCP_RR or another latency sensitive benchmark like memcached, you
should see real-world performance benefits here as well.

Thanks,
-Christoffer

> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
> >This series redesigns parts of KVM/ARM to optimize the performance on
> >VHE systems.  The general approach is to try to do as little work as
> >possible when transitioning between the VM and the hypervisor.  This has
> >the benefit of lower latency when waiting for interrupts and delivering
> >virtual interrupts, and reduces the overhead of emulating behavior and
> >I/O in the host kernel.
> >
> >Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> >that can be generally improved.  We then add infrastructure to move more
> >logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> >registers.
> >
> >We then introduce a new world-switch function for VHE systems, which we
> >can tweak and optimize for VHE systems.  To do that, we rework a lot of
> >the system register save/restore handling and emulation code that may
> >need access to system registers, so that we can defer as many system
> >register save/restore operations to vcpu_load and vcpu_put, and move
> >this logic out of the VHE world switch function.
> >
> >We then optimize the configuration of traps.  On non-VHE systems, both
> >the host and VM kernels run in EL1, but because the host kernel should
> >have full access to the underlying hardware, but the VM kernel should
> >not, we essentially make the host kernel more privileged than the VM
> >kernel despite them both running at the same privilege level by enabling
> >VE traps when entering the VM and disabling those traps when exiting the
> >VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> >the hardware (as much as allowed by secure side software), and is
> >unaffected by the trap configuration.  That means we can configure the
> >traps for VMs running in EL1 once, and don't have to switch them on and
> >off for every entry/exit to/from the VM.
> >
> >Finally, we improve our VGIC handling by moving all save/restore logic
> >out of the VHE world-switch, and we make it possible to truly only
> >evaluate if the AP list is empty and not do *any* VGIC work if that is
> >the case, and only do the minimal amount of work required in the course
> >of the VGIC processing when we have virtual interrupts in flight.
> >
> >The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> >interrupts support series [1], and the first five patches of James' SDEI
> >series [2].
> >
> >I've given the patches a fair amount of testing on Thunder-X, Mustang,
> >Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> >functionality on the Foundation model, running both 64-bit VMs and
> >32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> >GICv2-on-GICv3.
> >
> >The patches are also available in the vhe-optimize-v3 branch on my
> >kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> >prerequisites of this series.
> >
> >Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> >
> >Thanks,
> >-Christoffer
> >
> >[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> >[2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> >[3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> >
> >Christoffer Dall (40):
> >   KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
> >   KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
> >   KVM: arm64: Avoid storing the vcpu pointer on the stack
> >   KVM: arm64: Rework hyp_panic for VHE and non-VHE
> >   KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
> >   KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
> >   KVM: arm/arm64: Introduce vcpu_el1_is_32bit
> >   KVM: arm64: Defer restoring host VFP state to vcpu_put
> >   KVM: arm64: Move debug dirty flag calculation out of world switch
> >   KVM: arm64: Slightly improve debug save/restore functions
> >   KVM: arm64: Improve debug register save/restore flow
> >   KVM: arm64: Factor out fault info population and gic workarounds
> >   KVM: arm64: Introduce VHE-specific kvm_vcpu_run
> >   KVM: arm64: Remove kern_hyp_va() use in VHE switch function
> >   KVM: arm64: Don't deactivate VM on VHE systems
> >   KVM: arm64: Remove noop calls to timer save/restore from VHE switch
> >   KVM: arm64: Move userspace system registers into separate function
> >   KVM: arm64: Rewrite sysreg alternatives to static keys
> >   KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
> >     functions
> >   KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
> >   KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
> >   KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
> >   KVM: arm64: Change 32-bit handling of VM system registers
> >   KVM: arm64: Rewrite system register accessors to read/write functions
> >   KVM: arm64: Introduce framework for accessing deferred sysregs
> >   KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
> >   KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
> >   KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
> >     VHE
> >   KVM: arm64: Prepare to handle deferred save/restore of 32-bit
> >     registers
> >   KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
> >   KVM: arm64: Move common VHE/non-VHE trap config in separate functions
> >   KVM: arm64: Configure FPSIMD traps on vcpu load/put
> >   KVM: arm64: Configure c15, PMU, and debug register traps on cpu
> >     load/put for VHE
> >   KVM: arm64: Separate activate_traps and deactive_traps for VHE and
> >     non-VHE
> >   KVM: arm/arm64: Get rid of vgic_elrsr
> >   KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
> >   KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
> >   KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
> >     VHE
> >   KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
> >   KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> >
> >Shih-Wei Li (1):
> >   KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> >
> >  arch/arm/include/asm/kvm_asm.h                    |   5 +-
> >  arch/arm/include/asm/kvm_emulate.h                |  21 +-
> >  arch/arm/include/asm/kvm_host.h                   |   6 +-
> >  arch/arm/include/asm/kvm_hyp.h                    |   4 +
> >  arch/arm/kvm/emulate.c                            |   4 +-
> >  arch/arm/kvm/hyp/Makefile                         |   1 -
> >  arch/arm/kvm/hyp/switch.c                         |  16 +-
> >  arch/arm64/include/asm/kvm_arm.h                  |   4 +-
> >  arch/arm64/include/asm/kvm_asm.h                  |  18 +-
> >  arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
> >  arch/arm64/include/asm/kvm_host.h                 |  49 ++-
> >  arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
> >  arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
> >  arch/arm64/kernel/asm-offsets.c                   |   2 +
> >  arch/arm64/kvm/debug.c                            |  28 +-
> >  arch/arm64/kvm/guest.c                            |   3 -
> >  arch/arm64/kvm/hyp/Makefile                       |   2 +-
> >  arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
> >  arch/arm64/kvm/hyp/entry.S                        |   9 +-
> >  arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
> >  arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
> >  arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
> >  {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
> >  arch/arm64/kvm/inject_fault.c                     |  24 +-
> >  arch/arm64/kvm/regmap.c                           |  65 +++-
> >  arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
> >  arch/arm64/kvm/sys_regs.h                         |   4 +-
> >  arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
> >  include/kvm/arm_vgic.h                            |   2 -
> >  virt/kvm/arm/aarch32.c                            |   2 +-
> >  virt/kvm/arm/arch_timer.c                         |   7 -
> >  virt/kvm/arm/arm.c                                |  50 ++-
> >  virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
> >  virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
> >  virt/kvm/arm/mmu.c                                |   6 +-
> >  virt/kvm/arm/pmu.c                                |  37 +-
> >  virt/kvm/arm/vgic/vgic-init.c                     |  11 -
> >  virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
> >  virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
> >  virt/kvm/arm/vgic/vgic.c                          |  21 ++
> >  virt/kvm/arm/vgic/vgic.h                          |   3 +
> >  41 files changed, 1229 insertions(+), 701 deletions(-)
> >  rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> >

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 00/41] Optimize KVM/ARM for VHE systems
@ 2018-02-08 15:47     ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tomasz,

On Thu, Feb 01, 2018 at 02:57:59PM +0100, Tomasz Nowicki wrote:
> 
> I created simple module for VM kernel. It is spinning on PSCI version
> hypercall to measure the base exit cost as you suggested. Also, I measured
> CPU cycles for each loop and here are my results:
> 
> My setup:
> 1-socket ThunderX2 running VM - 1VCPU
> 
> Tested baselines:
> a) host kernel v4.15-rc3 and VM kernel v4.15-rc3
> b) host kernel v4.15-rc3 + vhe-optimize-v3-with-fixes and VM kernel
> v4.15-rc3
> 
> Module was loaded from VM and the results are presented in [%] relative to
> average CPU cycles spending on PSCI version hypercall for vanilla VHE host
> kernel v4.15-rc3:
> 
>              VHE  |  nVHE
> =========================
> baseline a)  100% |  130%
> =========================
> baseline a)  36%  |  123%
> 
> So I confirm significant performance improvement, especially for VHE case.
> Additionally,

Thanks for this.  Good to know the exit cost is still reduced.

> I run network throughput tests with vhost-net but for that
> case no differences.
> 

Throughput on vhost-net wouldn't be affected, because its protocol is
specifically designed around avoiding exits.  But if you measure latency
with TCP_RR or another latency sensitive benchmark like memcached, you
should see real-world performance benefits here as well.

Thanks,
-Christoffer

> 
> On 12.01.2018 13:07, Christoffer Dall wrote:
> >This series redesigns parts of KVM/ARM to optimize the performance on
> >VHE systems.  The general approach is to try to do as little work as
> >possible when transitioning between the VM and the hypervisor.  This has
> >the benefit of lower latency when waiting for interrupts and delivering
> >virtual interrupts, and reduces the overhead of emulating behavior and
> >I/O in the host kernel.
> >
> >Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
> >that can be generally improved.  We then add infrastructure to move more
> >logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
> >registers.
> >
> >We then introduce a new world-switch function for VHE systems, which we
> >can tweak and optimize for VHE systems.  To do that, we rework a lot of
> >the system register save/restore handling and emulation code that may
> >need access to system registers, so that we can defer as many system
> >register save/restore operations to vcpu_load and vcpu_put, and move
> >this logic out of the VHE world switch function.
> >
> >We then optimize the configuration of traps.  On non-VHE systems, both
> >the host and VM kernels run in EL1, but because the host kernel should
> >have full access to the underlying hardware, but the VM kernel should
> >not, we essentially make the host kernel more privileged than the VM
> >kernel despite them both running at the same privilege level by enabling
> >VE traps when entering the VM and disabling those traps when exiting the
> >VM.  On VHE systems, the host kernel runs in EL2 and has full access to
> >the hardware (as much as allowed by secure side software), and is
> >unaffected by the trap configuration.  That means we can configure the
> >traps for VMs running in EL1 once, and don't have to switch them on and
> >off for every entry/exit to/from the VM.
> >
> >Finally, we improve our VGIC handling by moving all save/restore logic
> >out of the VHE world-switch, and we make it possible to truly only
> >evaluate if the AP list is empty and not do *any* VGIC work if that is
> >the case, and only do the minimal amount of work required in the course
> >of the VGIC processing when we have virtual interrupts in flight.
> >
> >The patches are based on v4.15-rc3, v9 of the level-triggered mapped
> >interrupts support series [1], and the first five patches of James' SDEI
> >series [2].
> >
> >I've given the patches a fair amount of testing on Thunder-X, Mustang,
> >Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
> >functionality on the Foundation model, running both 64-bit VMs and
> >32-bit VMs side-by-side and using both GICv3-on-GICv3 and
> >GICv2-on-GICv3.
> >
> >The patches are also available in the vhe-optimize-v3 branch on my
> >kernel.org repository [3].  The vhe-optimize-v3-base branch contains
> >prerequisites of this series.
> >
> >Changes since v2:
> >  - Rebased on v4.15-rc3.
> >  - Includes two additional patches that only does vcpu_load after
> >    kvm_vcpu_first_run_init and only for KVM_RUN.
> >  - Addressed review comments from v2 (detailed changelogs are in the
> >    individual patches).
> >
> >Thanks,
> >-Christoffer
> >
> >[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
> >[2]: git://linux-arm.org/linux-jm.git sdei/v5/base
> >[3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3
> >
> >Christoffer Dall (40):
> >   KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN
> >   KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init
> >   KVM: arm64: Avoid storing the vcpu pointer on the stack
> >   KVM: arm64: Rework hyp_panic for VHE and non-VHE
> >   KVM: arm/arm64: Get rid of vcpu->arch.irq_lines
> >   KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs
> >   KVM: arm/arm64: Introduce vcpu_el1_is_32bit
> >   KVM: arm64: Defer restoring host VFP state to vcpu_put
> >   KVM: arm64: Move debug dirty flag calculation out of world switch
> >   KVM: arm64: Slightly improve debug save/restore functions
> >   KVM: arm64: Improve debug register save/restore flow
> >   KVM: arm64: Factor out fault info population and gic workarounds
> >   KVM: arm64: Introduce VHE-specific kvm_vcpu_run
> >   KVM: arm64: Remove kern_hyp_va() use in VHE switch function
> >   KVM: arm64: Don't deactivate VM on VHE systems
> >   KVM: arm64: Remove noop calls to timer save/restore from VHE switch
> >   KVM: arm64: Move userspace system registers into separate function
> >   KVM: arm64: Rewrite sysreg alternatives to static keys
> >   KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore
> >     functions
> >   KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe
> >   KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions
> >   KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems
> >   KVM: arm64: Change 32-bit handling of VM system registers
> >   KVM: arm64: Rewrite system register accessors to read/write functions
> >   KVM: arm64: Introduce framework for accessing deferred sysregs
> >   KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1
> >   KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1
> >   KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on
> >     VHE
> >   KVM: arm64: Prepare to handle deferred save/restore of 32-bit
> >     registers
> >   KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put
> >   KVM: arm64: Move common VHE/non-VHE trap config in separate functions
> >   KVM: arm64: Configure FPSIMD traps on vcpu load/put
> >   KVM: arm64: Configure c15, PMU, and debug register traps on cpu
> >     load/put for VHE
> >   KVM: arm64: Separate activate_traps and deactive_traps for VHE and
> >     non-VHE
> >   KVM: arm/arm64: Get rid of vgic_elrsr
> >   KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
> >   KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64
> >   KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on
> >     VHE
> >   KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load
> >   KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
> >
> >Shih-Wei Li (1):
> >   KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
> >
> >  arch/arm/include/asm/kvm_asm.h                    |   5 +-
> >  arch/arm/include/asm/kvm_emulate.h                |  21 +-
> >  arch/arm/include/asm/kvm_host.h                   |   6 +-
> >  arch/arm/include/asm/kvm_hyp.h                    |   4 +
> >  arch/arm/kvm/emulate.c                            |   4 +-
> >  arch/arm/kvm/hyp/Makefile                         |   1 -
> >  arch/arm/kvm/hyp/switch.c                         |  16 +-
> >  arch/arm64/include/asm/kvm_arm.h                  |   4 +-
> >  arch/arm64/include/asm/kvm_asm.h                  |  18 +-
> >  arch/arm64/include/asm/kvm_emulate.h              |  74 +++-
> >  arch/arm64/include/asm/kvm_host.h                 |  49 ++-
> >  arch/arm64/include/asm/kvm_hyp.h                  |  32 +-
> >  arch/arm64/include/asm/kvm_mmu.h                  |   2 +-
> >  arch/arm64/kernel/asm-offsets.c                   |   2 +
> >  arch/arm64/kvm/debug.c                            |  28 +-
> >  arch/arm64/kvm/guest.c                            |   3 -
> >  arch/arm64/kvm/hyp/Makefile                       |   2 +-
> >  arch/arm64/kvm/hyp/debug-sr.c                     |  88 +++--
> >  arch/arm64/kvm/hyp/entry.S                        |   9 +-
> >  arch/arm64/kvm/hyp/hyp-entry.S                    |  41 +--
> >  arch/arm64/kvm/hyp/switch.c                       | 404 +++++++++++++---------
> >  arch/arm64/kvm/hyp/sysreg-sr.c                    | 192 ++++++++--
> >  {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c |  81 -----
> >  arch/arm64/kvm/inject_fault.c                     |  24 +-
> >  arch/arm64/kvm/regmap.c                           |  65 +++-
> >  arch/arm64/kvm/sys_regs.c                         | 247 +++++++++++--
> >  arch/arm64/kvm/sys_regs.h                         |   4 +-
> >  arch/arm64/kvm/sys_regs_generic_v8.c              |   4 +-
> >  include/kvm/arm_vgic.h                            |   2 -
> >  virt/kvm/arm/aarch32.c                            |   2 +-
> >  virt/kvm/arm/arch_timer.c                         |   7 -
> >  virt/kvm/arm/arm.c                                |  50 ++-
> >  virt/kvm/arm/hyp/timer-sr.c                       |  44 +--
> >  virt/kvm/arm/hyp/vgic-v3-sr.c                     | 244 +++++++------
> >  virt/kvm/arm/mmu.c                                |   6 +-
> >  virt/kvm/arm/pmu.c                                |  37 +-
> >  virt/kvm/arm/vgic/vgic-init.c                     |  11 -
> >  virt/kvm/arm/vgic/vgic-v2.c                       |  61 +++-
> >  virt/kvm/arm/vgic/vgic-v3.c                       |  12 +-
> >  virt/kvm/arm/vgic/vgic.c                          |  21 ++
> >  virt/kvm/arm/vgic/vgic.h                          |   3 +
> >  41 files changed, 1229 insertions(+), 701 deletions(-)
> >  rename {virt/kvm/arm => arch/arm64/kvm}/hyp/vgic-v2-sr.c (50%)
> >

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
  2018-02-05 13:29     ` Tomasz Nowicki
@ 2018-02-08 15:48       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 15:48 UTC (permalink / raw)
  To: Tomasz Nowicki
  Cc: Andrew Jones, kvm, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Mon, Feb 05, 2018 at 02:29:50PM +0100, Tomasz Nowicki wrote:
> 

[...]

> 
> For the whole series:
> Reviewed-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
> 

Much thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs
@ 2018-02-08 15:48       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-08 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 05, 2018 at 02:29:50PM +0100, Tomasz Nowicki wrote:
> 

[...]

> 
> For the whole series:
> Reviewed-by: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
> 

Much thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
  2018-02-08 13:24       ` Christoffer Dall
@ 2018-02-09 10:55         ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 10:55 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,

On 02/08/2018 01:24 PM, Christoffer Dall wrote:
> On Mon, Feb 05, 2018 at 06:04:25PM +0000, Julien Grall wrote:
>> On 12/01/18 12:07, Christoffer Dall wrote:
>>> +
>>>   	panic(__hyp_panic_string,
>>>   	      spsr,  elr,
>>>   	      read_sysreg_el2(esr),   read_sysreg_el2(far),
>>>   	      read_sysreg(hpfar_el2), par, vcpu);
>>>   }
>>> -static hyp_alternate_select(__hyp_call_panic,
>>> -			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
>>> -			    ARM64_HAS_VIRT_HOST_EXTN);
>>
>> Out of interest, any specific rather to remove hyp_alternate_select and
>> "open-code" it?
>>
> 
> Not sure I understand your question.
> 
> Are you asking why I replace the hyp alternatives with the has_vhe()?
> If so, has_vhe() uses a static key and should therefore have the same
> performance characteristics, but I find the has_vhe() version below much
> more readable.

That what I was asking. Thank you for the explanation.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE
@ 2018-02-09 10:55         ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 02/08/2018 01:24 PM, Christoffer Dall wrote:
> On Mon, Feb 05, 2018 at 06:04:25PM +0000, Julien Grall wrote:
>> On 12/01/18 12:07, Christoffer Dall wrote:
>>> +
>>>   	panic(__hyp_panic_string,
>>>   	      spsr,  elr,
>>>   	      read_sysreg_el2(esr),   read_sysreg_el2(far),
>>>   	      read_sysreg(hpfar_el2), par, vcpu);
>>>   }
>>> -static hyp_alternate_select(__hyp_call_panic,
>>> -			    __hyp_call_panic_nvhe, __hyp_call_panic_vhe,
>>> -			    ARM64_HAS_VIRT_HOST_EXTN);
>>
>> Out of interest, any specific rather to remove hyp_alternate_select and
>> "open-code" it?
>>
> 
> Not sure I understand your question.
> 
> Are you asking why I replace the hyp alternatives with the has_vhe()?
> If so, has_vhe() uses a static key and should therefore have the same
> performance characteristics, but I find the has_vhe() version below much
> more readable.

That what I was asking. Thank you for the explanation.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 11:38     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 11:38 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> From: Shih-Wei Li <shihwei@cs.columbia.edu>
> 
> We always set the IMO and FMO bits in the HCR_EL2 when running the
> guest, regardless if we use the vgic or not.  By moving these flags to
> HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
> HCR_EL2 in the world switch code, and we can also soon get rid of the
> other one.
> 
> This is safe, because even though the IMO and FMO bits control both
> taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
> executed at EL1, as long as we ensure that these bits are clear when
> running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.

NIT: I was a bit confused by the end of the sentence because 
HCR_HOST_FLAGS define does not seem to exist.

> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_arm.h | 4 ++--
>   arch/arm64/kvm/hyp/switch.c      | 3 ---
>   2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 715d395ef45b..656deeb17bf2 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -79,9 +79,9 @@
>    */
>   #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
>   			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> -			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
> +			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | \
> +			 HCR_FMO | HCR_IMO)
>   #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> -#define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
>   #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
>   
>   /* TCR_EL2 Registers bits */
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 71700ecee308..f6189d08753e 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -167,8 +167,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   		__vgic_v3_save_state(vcpu);
>   	else
>   		__vgic_v2_save_state(vcpu);
> -
> -	write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
> @@ -176,7 +174,6 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   	u64 val;
>   
>   	val = read_sysreg(hcr_el2);
> -	val |= 	HCR_INT_OVERRIDE;
>   	val |= vcpu->arch.irq_lines;
>   	write_sysreg(val, hcr_el2);
>   
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
@ 2018-02-09 11:38     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> From: Shih-Wei Li <shihwei@cs.columbia.edu>
> 
> We always set the IMO and FMO bits in the HCR_EL2 when running the
> guest, regardless if we use the vgic or not.  By moving these flags to
> HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
> HCR_EL2 in the world switch code, and we can also soon get rid of the
> other one.
> 
> This is safe, because even though the IMO and FMO bits control both
> taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
> executed at EL1, as long as we ensure that these bits are clear when
> running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.

NIT: I was a bit confused by the end of the sentence because 
HCR_HOST_FLAGS define does not seem to exist.

> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_arm.h | 4 ++--
>   arch/arm64/kvm/hyp/switch.c      | 3 ---
>   2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 715d395ef45b..656deeb17bf2 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -79,9 +79,9 @@
>    */
>   #define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
>   			 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
> -			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
> +			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | \
> +			 HCR_FMO | HCR_IMO)
>   #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
> -#define HCR_INT_OVERRIDE   (HCR_FMO | HCR_IMO)
>   #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
>   
>   /* TCR_EL2 Registers bits */
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 71700ecee308..f6189d08753e 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -167,8 +167,6 @@ static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
>   		__vgic_v3_save_state(vcpu);
>   	else
>   		__vgic_v2_save_state(vcpu);
> -
> -	write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
>   }
>   
>   static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
> @@ -176,7 +174,6 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
>   	u64 val;
>   
>   	val = read_sysreg(hcr_el2);
> -	val |= 	HCR_INT_OVERRIDE;
>   	val |= vcpu->arch.irq_lines;
>   	write_sysreg(val, hcr_el2);
>   
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 12:31     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 12:31 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm, Shih-Wei Li

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit
@ 2018-02-09 12:31     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> We have numerous checks around that checks if the HCR_EL2 has the RW bit
> set to figure out if we're running an AArch64 or AArch32 VM.  In some
> cases, directly checking the RW bit (given its unintuitive name), is a
> bit confusing, and that's not going to improve as we move logic around
> for the following patches that optimize KVM on AArch64 hosts with VHE.
> 
> Therefore, introduce a helper, vcpu_el1_is_32bit, and replace existing
> direct checks of HCR_EL2.RW with the helper.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 15:26     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 15:26 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the

s/do// ?

> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_host.h |  3 +++
>   arch/arm64/kernel/asm-offsets.c   |  1 +
>   arch/arm64/kvm/hyp/entry.S        |  3 +++
>   arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
>   arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
>   5 files changed, 40 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0e9e7291a7e6..9e23bc968668 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
>   	/* Guest debug state */
>   	u64 debug_flags;
>   
> +	/* 1 if the guest VFP state is loaded into the hardware */
> +	u8 guest_vfp_loaded;
> +
>   	/*
>   	 * We maintain more than a single set of debug registers to support
>   	 * debugging the guest from the host and to maintain separate host and
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 612021dce84f..99467327c043 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -133,6 +133,7 @@ int main(void)
>     DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
>     DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
>     DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> +  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
>     DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
>     DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>     DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index a360ac6e89e9..53652287a236 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -184,6 +184,9 @@ alternative_endif
>   	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>   	bl	__fpsimd_restore_state
>   
> +	mov	x0, #1
> +	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> +
>   	// Skip restoring fpexc32 for AArch64 guests
>   	mrs	x1, hcr_el2
>   	tbnz	x1, #HCR_RW_SHIFT, 1f
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 12dc647a6e5f..29e44a20f5e3 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,43 +24,32 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> -{
> -	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> -
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> -{
> -	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> -}
> -
> -static hyp_alternate_select(__fpsimd_is_enabled,
> -			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> -			    ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)

Now that __fpsimd_enabled is removed, I think you need to remove the 
prototype in arch/arm64/include/kvm_hyp.h too.

> -{
> -	return __fpsimd_is_enabled()();
> -}

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-09 15:26     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> Avoid saving the guest VFP registers and restoring the host VFP
> registers on every exit from the VM.  Only when we're about to run
> userspace or other threads in the kernel do we really have to switch the

s/do// ?

> state back to the host state.
> 
> We still initially configure the VFP registers to trap when entering the
> VM, but the difference is that we now leave the guest state in the
> hardware registers as long as we're running this VCPU, even if we
> occasionally trap to the host, and we only restore the host state when
> we return to user space or when scheduling another thread.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/include/asm/kvm_host.h |  3 +++
>   arch/arm64/kernel/asm-offsets.c   |  1 +
>   arch/arm64/kvm/hyp/entry.S        |  3 +++
>   arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
>   arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
>   5 files changed, 40 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0e9e7291a7e6..9e23bc968668 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
>   	/* Guest debug state */
>   	u64 debug_flags;
>   
> +	/* 1 if the guest VFP state is loaded into the hardware */
> +	u8 guest_vfp_loaded;
> +
>   	/*
>   	 * We maintain more than a single set of debug registers to support
>   	 * debugging the guest from the host and to maintain separate host and
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 612021dce84f..99467327c043 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -133,6 +133,7 @@ int main(void)
>     DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
>     DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
>     DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> +  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
>     DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
>     DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
>     DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index a360ac6e89e9..53652287a236 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -184,6 +184,9 @@ alternative_endif
>   	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>   	bl	__fpsimd_restore_state
>   
> +	mov	x0, #1
> +	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> +
>   	// Skip restoring fpexc32 for AArch64 guests
>   	mrs	x1, hcr_el2
>   	tbnz	x1, #HCR_RW_SHIFT, 1f
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 12dc647a6e5f..29e44a20f5e3 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -24,43 +24,32 @@
>   #include <asm/fpsimd.h>
>   #include <asm/debug-monitors.h>
>   
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> -{
> -	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> -
> -static bool __hyp_text __fpsimd_enabled_vhe(void)
> -{
> -	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> -}
> -
> -static hyp_alternate_select(__fpsimd_is_enabled,
> -			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> -			    ARM64_HAS_VIRT_HOST_EXTN);
> -
> -bool __hyp_text __fpsimd_enabled(void)

Now that __fpsimd_enabled is removed, I think you need to remove the 
prototype in arch/arm64/include/kvm_hyp.h too.

> -{
> -	return __fpsimd_is_enabled()();
> -}

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-07 17:56           ` Christoffer Dall
@ 2018-02-09 15:59             ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-09 15:59 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvm, kvmarm, Shih-Wei Li, linux-arm-kernel

On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > state back to the host state.
> > > > > 
> > > > > We still initially configure the VFP registers to trap when entering the
> > > > > VM, but the difference is that we now leave the guest state in the
> > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > occasionally trap to the host, and we only restore the host state when
> > > > > we return to user space or when scheduling another thread.
> > > > > 
> > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > >   */
> > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > +
> > > > > +	/* Restore host FP/SIMD state */
> > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > +		}
> > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > 
> > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > in, we probably don't need to restore it here.
> > > > 
> > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > it's only done for SVE, since KVM was previously restoring the host
> > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > 
> > > > For a returning run ioctl, this would have the effect of deferring the
> > > > host FPSIMD reload until we return to userspace, which is probably
> > > > no more costly since the kernel must check whether to do this in
> > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > other thread we save the cost of restoring the host state entirely here
> > > > ... I think.
> > > 
> > > Yes, I agree.  However, currently the low-level logic in
> > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > that simply marking the state as invalid would cause the kernel to
> > > restore some potentially stale values when returning to userspace.  Am I
> > > missing something?
> > 
> > I think my point was that there would be no need for the low-level
> > save of the host fpsimd state currently done by hyp.  At all.  The
> > state would already have been saved off to thread_struct before
> > entering the guest.
> 
> Ah, so if userspace touched any FPSIMD state, then we always save that
> state when entering the kernel, even if we're just going to return to
> the same userspace process anyway?  (For any system call etc.?)

Not exactly.  The state is saved when the corresponding user task is
scheduled out, or when it would become stale because we're about to
modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
The state is also saved if necessary by kernel_neon_begin().

Simply entering the kernel and returning to userspace doesn't have
this effect by itself.


Prior to the SVE patches, KVM makes itself orthogonal to the host
context switch machinery by ensuring that whatever the host had
in the FPSIMD regs at guest entry is restored before returning to
the host. (IIUC)  This means that redundant save/restore work is
done by KVM, but does have the advantage of simplicity.

This breaks for SVE though: the high bits of the Z-registers will be
zeroed as a side effect of the FPSIMD save/restore done by KVM.
This means that if the host has state in those bits then it must
be saved before entring the guest: that's what the new
kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
The alternative would have been for KVM to save/restore the host SVE
state directly, but this seemed premature and invasive in the absence
of full KVM SVE support.

This means that KVM's own save/restore of the host's FPSIMD state
becomes redundant in this case, but since there is no SVE hardware
yet, I favoured correctness over optimal performance here.


My point here was that we could modify this hook to always save off the
host FPSIMD state unconditionally before entering the guts of KVM,
instead of only doing it when there is live SVE state.  The benefit of
this is that the host context switch machinery knows if the state has
already been saved and won't do it again.  Thus a kvm userspace -> vcpu
(-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
will only save the host FPSIMD (or SVE) state once, and won't restore
it at all (assuming no context switches).

Instead, the user thread's FPSIMD state is only reloaded on the final
return to userspace.

> > 
> > This would result in a redundant save, but only when the host fpsimd
> > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > back to the host.
> > 
> > For the host, the fpsimd state is only dirty after entering the kernel
> > from userspace (or after certain other things like sigreturn or ptrace).
> > So this approach would still avoid repeated save/restore when cycling
> > between the guest and the kvm code in the host.
> > 
> 
> I see.
> 
> > > It might very well be possible to change the logic so that we store the
> > > host logic the same place where task_fpsimd_save() would have, and I
> > > think that would make what you suggest possible.
> > 
> > That's certainly possible, but I viewed that as harder.  It would be
> > necessary to map the host thread_struct into hyp etc. etc.
> > 
> 
> And even then, unnecessary because it would duplicate the existing state
> save, IIUC above.

Agreed.  The main disadvantage of my approach is that the host state
cannot be saved lazily any more, but it least it is only saved once
for a given vcpu run loop.

> > > I'd like to make that a separate change from this patch though, as we're
> > > already changing quite a bit with this series, so I'm trying to make any
> > > logical change as contained per patch as possible, so that problems can
> > > be spotted by bisecting.
> > 
> > Yes, I think that's wise.
> > 
> 
> ok, I'll try to incorporate this as a separate patch for the next
> revision.
> 
> > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > and then reentering the guest doesn't cause any reload at all.  
> > > 
> > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > when you run another thread (preemptively or voluntarily), or when you
> > > return to user space, but making the vcpu fpsimd context a first-class
> > > citizen fpsimd context would mean that you can run another thread (and
> > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > save/restore anything.  Am I getting this right?
> > 
> > Yes (except that if a return to userspace happens then FPSIMD will be
> > restored at that point: there is no laziness there -- it _could_
> > be lazy, but it's deemed unlikely to be a performance win due to the
> > fact that the compiler can and does generate FPSIMD code quite
> > liberally by default).
> > 
> > For the case of being preempted within the kernel with no ret_to_user,
> > you are correct.
> > 
> 
> ok, that would indeed also be useful for things like switching to a
> vhost thread and returning to the vcpu thread.

What's a vhost thread?

> > > 
> > > > But
> > > > that feels like too big a step for this series, and there are likely
> > > > side-issues I've not thought about yet.
> > > > 
> > > 
> > > It should definitely be in separate patches, but I would be optn to
> > > tagging something on to the end of this series if we can stabilize this
> > > series early after -rc1 is out.
> > 
> > I haven't fully got my head around it, but we can see where we get to.
> > Best not to rush into it if there's any doubt...
> > 
> Agreed, we can always add things later.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-09 15:59             ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-09 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > state back to the host state.
> > > > > 
> > > > > We still initially configure the VFP registers to trap when entering the
> > > > > VM, but the difference is that we now leave the guest state in the
> > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > occasionally trap to the host, and we only restore the host state when
> > > > > we return to user space or when scheduling another thread.
> > > > > 
> > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > 
> > > > [...]
> > > > 
> > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > >   */
> > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > +
> > > > > +	/* Restore host FP/SIMD state */
> > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > +		}
> > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > 
> > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > in, we probably don't need to restore it here.
> > > > 
> > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > it's only done for SVE, since KVM was previously restoring the host
> > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > 
> > > > For a returning run ioctl, this would have the effect of deferring the
> > > > host FPSIMD reload until we return to userspace, which is probably
> > > > no more costly since the kernel must check whether to do this in
> > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > other thread we save the cost of restoring the host state entirely here
> > > > ... I think.
> > > 
> > > Yes, I agree.  However, currently the low-level logic in
> > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > that simply marking the state as invalid would cause the kernel to
> > > restore some potentially stale values when returning to userspace.  Am I
> > > missing something?
> > 
> > I think my point was that there would be no need for the low-level
> > save of the host fpsimd state currently done by hyp.  At all.  The
> > state would already have been saved off to thread_struct before
> > entering the guest.
> 
> Ah, so if userspace touched any FPSIMD state, then we always save that
> state when entering the kernel, even if we're just going to return to
> the same userspace process anyway?  (For any system call etc.?)

Not exactly.  The state is saved when the corresponding user task is
scheduled out, or when it would become stale because we're about to
modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
The state is also saved if necessary by kernel_neon_begin().

Simply entering the kernel and returning to userspace doesn't have
this effect by itself.


Prior to the SVE patches, KVM makes itself orthogonal to the host
context switch machinery by ensuring that whatever the host had
in the FPSIMD regs at guest entry is restored before returning to
the host. (IIUC)  This means that redundant save/restore work is
done by KVM, but does have the advantage of simplicity.

This breaks for SVE though: the high bits of the Z-registers will be
zeroed as a side effect of the FPSIMD save/restore done by KVM.
This means that if the host has state in those bits then it must
be saved before entring the guest: that's what the new
kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
The alternative would have been for KVM to save/restore the host SVE
state directly, but this seemed premature and invasive in the absence
of full KVM SVE support.

This means that KVM's own save/restore of the host's FPSIMD state
becomes redundant in this case, but since there is no SVE hardware
yet, I favoured correctness over optimal performance here.


My point here was that we could modify this hook to always save off the
host FPSIMD state unconditionally before entering the guts of KVM,
instead of only doing it when there is live SVE state.  The benefit of
this is that the host context switch machinery knows if the state has
already been saved and won't do it again.  Thus a kvm userspace -> vcpu
(-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
will only save the host FPSIMD (or SVE) state once, and won't restore
it at all (assuming no context switches).

Instead, the user thread's FPSIMD state is only reloaded on the final
return to userspace.

> > 
> > This would result in a redundant save, but only when the host fpsimd
> > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > back to the host.
> > 
> > For the host, the fpsimd state is only dirty after entering the kernel
> > from userspace (or after certain other things like sigreturn or ptrace).
> > So this approach would still avoid repeated save/restore when cycling
> > between the guest and the kvm code in the host.
> > 
> 
> I see.
> 
> > > It might very well be possible to change the logic so that we store the
> > > host logic the same place where task_fpsimd_save() would have, and I
> > > think that would make what you suggest possible.
> > 
> > That's certainly possible, but I viewed that as harder.  It would be
> > necessary to map the host thread_struct into hyp etc. etc.
> > 
> 
> And even then, unnecessary because it would duplicate the existing state
> save, IIUC above.

Agreed.  The main disadvantage of my approach is that the host state
cannot be saved lazily any more, but it least it is only saved once
for a given vcpu run loop.

> > > I'd like to make that a separate change from this patch though, as we're
> > > already changing quite a bit with this series, so I'm trying to make any
> > > logical change as contained per patch as possible, so that problems can
> > > be spotted by bisecting.
> > 
> > Yes, I think that's wise.
> > 
> 
> ok, I'll try to incorporate this as a separate patch for the next
> revision.
> 
> > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > and then reentering the guest doesn't cause any reload at all.  
> > > 
> > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > when you run another thread (preemptively or voluntarily), or when you
> > > return to user space, but making the vcpu fpsimd context a first-class
> > > citizen fpsimd context would mean that you can run another thread (and
> > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > save/restore anything.  Am I getting this right?
> > 
> > Yes (except that if a return to userspace happens then FPSIMD will be
> > restored at that point: there is no laziness there -- it _could_
> > be lazy, but it's deemed unlikely to be a performance win due to the
> > fact that the compiler can and does generate FPSIMD code quite
> > liberally by default).
> > 
> > For the case of being preempted within the kernel with no ret_to_user,
> > you are correct.
> > 
> 
> ok, that would indeed also be useful for things like switching to a
> vhost thread and returning to the vcpu thread.

What's a vhost thread?

> > > 
> > > > But
> > > > that feels like too big a step for this series, and there are likely
> > > > side-issues I've not thought about yet.
> > > > 
> > > 
> > > It should definitely be in separate patches, but I would be optn to
> > > tagging something on to the end of this series if we can stabilize this
> > > series early after -rc1 is out.
> > 
> > I haven't fully got my head around it, but we can see where we get to.
> > Best not to rush into it if there's any doubt...
> > 
> Agreed, we can always add things later.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-01-25 19:54       ` Christoffer Dall
@ 2018-02-09 16:17         ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-09 16:17 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> > On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > > We are about to defer saving and restoring some groups of system
> > > registers to vcpu_put and vcpu_load on supported systems.  This means
> > > that we need some infrastructure to access system registes which
> > > supports either accessing the memory backing of the register or directly
> > > accessing the system registers, depending on the state of the system
> > > when we access the register.
> > > 
> > > We do this by defining a set of read/write accessors for each system
> > > register, and letting each system register be defined as "immediate" or
> > > "deferrable".  Immediate registers are always saved/restored in the
> > > world-switch path, but deferrable registers are only saved/restored in
> > > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > > in that case.
> > > 
> > > Not that we don't use the deferred mechanism yet in this patch, but only
> > > introduce infrastructure.  This is to improve convenience of review in
> > > the subsequent patches where it is clear which registers become
> > > deferred.
> > 
> > Might this table-driven approach result in a lot of branch mispredicts,
> > particularly across load/put boundaries?
> > 
> > If we were to move the whole construct to a header, then it could get
> > constant-folded at the call site down to the individual reg accessed,
> > say:
> > 
> > 	if (sys_regs_loaded)
> > 		read_sysreg_s(TPIDR_EL0);
> > 	else
> > 		__vcpu_sys_reg(v, TPIDR_EL0);
> > 
> > Where multiple regs are accessed close to each other, the compiler
> > may be able to specialise the whole sequence for the loaded and !loaded
> > cases so that there is only one conditional branch.
> > 
> 
> That's an interesting thing to consider indeed.  I wasn't really sure
> how to put this in a header file which wouldn't look overly bloated for
> inclusion elsewhere, so we ended up with this.
> 
> I don't think the alternative suggestion that I discused with Julien on
> this patch changes this much, but since you've had a look at this, I'm
> curious which one of the two (lookup table vs. giant switch) you prefer?

The giant switch approach has the advantage that it is likely to be
folded down to a single case when the switch control expression is
const-foldable; the flipside is that when that fails the entire
switch would be inlined.

> > The individual accessor functions also become unnecessary in this case,
> > because we wouldn't need to derive function pointers from them any
> > more.
> > 
> > I don't know how performance would compare in practice though.
> 
> I don't know either.  But I will say that the whole idea behind put/load
> is that you do this rarely, and going to userspace from KVM is
> notriously expensive, also on x86.

I guess that makes sense.  I'm still a bit hazy on the big picture
for KVM.

> > I'm also assuming that all calls to these accessors are const-foldable.
> > If not, relying on inlining would bloat the generated code a lot.
> 
> We have places where this is not the cae, access_vm_reg() for example.
> But if we really, really, wanted to, we could rewrite that to have a
> function for each register, but that's pretty horrid on its own.

That might not be too bad if there is only one giant inline expansion
and the rest are folded down.


I guess this is something to revisit _if_ we suspect a performance
bottleneck later on.

For now, I was lacking some understanding regarding how this code gets
run, so I was guessing about potential issues rather then proven
issues.

As you might guess, I'm still at the "stupid questions" stage for
this series :)

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-02-09 16:17         ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-09 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> > On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > > We are about to defer saving and restoring some groups of system
> > > registers to vcpu_put and vcpu_load on supported systems.  This means
> > > that we need some infrastructure to access system registes which
> > > supports either accessing the memory backing of the register or directly
> > > accessing the system registers, depending on the state of the system
> > > when we access the register.
> > > 
> > > We do this by defining a set of read/write accessors for each system
> > > register, and letting each system register be defined as "immediate" or
> > > "deferrable".  Immediate registers are always saved/restored in the
> > > world-switch path, but deferrable registers are only saved/restored in
> > > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > > in that case.
> > > 
> > > Not that we don't use the deferred mechanism yet in this patch, but only
> > > introduce infrastructure.  This is to improve convenience of review in
> > > the subsequent patches where it is clear which registers become
> > > deferred.
> > 
> > Might this table-driven approach result in a lot of branch mispredicts,
> > particularly across load/put boundaries?
> > 
> > If we were to move the whole construct to a header, then it could get
> > constant-folded at the call site down to the individual reg accessed,
> > say:
> > 
> > 	if (sys_regs_loaded)
> > 		read_sysreg_s(TPIDR_EL0);
> > 	else
> > 		__vcpu_sys_reg(v, TPIDR_EL0);
> > 
> > Where multiple regs are accessed close to each other, the compiler
> > may be able to specialise the whole sequence for the loaded and !loaded
> > cases so that there is only one conditional branch.
> > 
> 
> That's an interesting thing to consider indeed.  I wasn't really sure
> how to put this in a header file which wouldn't look overly bloated for
> inclusion elsewhere, so we ended up with this.
> 
> I don't think the alternative suggestion that I discused with Julien on
> this patch changes this much, but since you've had a look at this, I'm
> curious which one of the two (lookup table vs. giant switch) you prefer?

The giant switch approach has the advantage that it is likely to be
folded down to a single case when the switch control expression is
const-foldable; the flipside is that when that fails the entire
switch would be inlined.

> > The individual accessor functions also become unnecessary in this case,
> > because we wouldn't need to derive function pointers from them any
> > more.
> > 
> > I don't know how performance would compare in practice though.
> 
> I don't know either.  But I will say that the whole idea behind put/load
> is that you do this rarely, and going to userspace from KVM is
> notriously expensive, also on x86.

I guess that makes sense.  I'm still a bit hazy on the big picture
for KVM.

> > I'm also assuming that all calls to these accessors are const-foldable.
> > If not, relying on inlining would bloat the generated code a lot.
> 
> We have places where this is not the cae, access_vm_reg() for example.
> But if we really, really, wanted to, we could rewrite that to have a
> function for each register, but that's pretty horrid on its own.

That might not be too bad if there is only one giant inline expansion
and the rest are folded down.


I guess this is something to revisit _if_ we suspect a performance
bottleneck later on.

For now, I was lacking some understanding regarding how this code gets
run, so I was guessing about potential issues rather then proven
issues.

As you might guess, I'm still at the "stupid questions" stage for
this series :)

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 17:34     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 17:34 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> So far this is just a copy of the legacy non-VHE switch function, but we
> will start reworking these functions in separate directions to work on
> VHE and non-VHE in the most optimal way in later patches.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm/include/asm/kvm_asm.h   |  5 +++-
>   arch/arm/kvm/hyp/switch.c        |  2 +-
>   arch/arm64/include/asm/kvm_asm.h |  4 ++-
>   arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
>   virt/kvm/arm/arm.c               |  5 +++-
>   5 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..4ac717276543 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>   
>   extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>   
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +/* no VHE on 32-bit :( */
> +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }

Should we return an error or add a BUG() to catch potential use of this 
function?

> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>   
>   extern void __init_stage2_translation(void);
>   
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index c3b9799e2e13..7b2bd25e3b10 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>   	return true;
>   }
>   
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_cpu_context *guest_ctxt;
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 6c7599b5cb40..fb91e728207b 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>   
>   extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>   
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>   
>   extern u64 __vgic_v3_get_ich_vtr_el2(void);
>   extern u64 __vgic_v3_read_vmcr(void);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 55ca2e3d42eb..accfe9a016f9 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   	return false;
>   }
>   
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +/* Switch to the guest for VHE systems running in EL2 */
> +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *host_ctxt;
> +	struct kvm_cpu_context *guest_ctxt;
> +	u64 exit_code;
> +
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt->__hyp_running_vcpu = vcpu;
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__sysreg_save_host_state(host_ctxt);
> +
> +	__activate_traps(vcpu);
> +	__activate_vm(vcpu);
> +
> +	__vgic_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
> +
> +	/*
> +	 * We must restore the 32-bit state before the sysregs, thanks
> +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +	 */
> +	__sysreg32_restore_state(vcpu);
> +	__sysreg_restore_guest_state(guest_ctxt);
> +	__debug_switch_to_guest(vcpu);
> +
> +	do {
> +		/* Jump in the fire! */
> +		exit_code = __guest_enter(vcpu, host_ctxt);
> +
> +		/* And we're baaack! */
> +	} while (fixup_guest_exit(vcpu, &exit_code));
> +
> +	__sysreg_save_guest_state(guest_ctxt);
> +	__sysreg32_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +	__vgic_save_state(vcpu);
> +
> +	__deactivate_traps(vcpu);
> +	__deactivate_vm(vcpu);
> +
> +	__sysreg_restore_host_state(host_ctxt);
> +
> +	/*
> +	 * This must come after restoring the host sysregs, since a non-VHE
> +	 * system may enable SPE here and make use of the TTBRs.
> +	 */
> +	__debug_switch_to_host(vcpu);
> +
> +	return exit_code;
> +}
> +
> +/* Switch to the guest for legacy non-VHE systems */
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_cpu_context *guest_ctxt;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5b1487bd91e8..6bce8f9c55db 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   		trace_kvm_entry(*vcpu_pc(vcpu));
>   		guest_enter_irqoff();
>   
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		if (has_vhe())
> +			ret = kvm_vcpu_run_vhe(vcpu);
> +		else
> +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
>   
>   		vcpu->mode = OUTSIDE_GUEST_MODE;
>   		vcpu->stat.exits++;
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
@ 2018-02-09 17:34     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> So far this is just a copy of the legacy non-VHE switch function, but we
> will start reworking these functions in separate directions to work on
> VHE and non-VHE in the most optimal way in later patches.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm/include/asm/kvm_asm.h   |  5 +++-
>   arch/arm/kvm/hyp/switch.c        |  2 +-
>   arch/arm64/include/asm/kvm_asm.h |  4 ++-
>   arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
>   virt/kvm/arm/arm.c               |  5 +++-
>   5 files changed, 69 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..4ac717276543 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>   
>   extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>   
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +/* no VHE on 32-bit :( */
> +static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }

Should we return an error or add a BUG() to catch potential use of this 
function?

> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>   
>   extern void __init_stage2_translation(void);
>   
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index c3b9799e2e13..7b2bd25e3b10 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -153,7 +153,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>   	return true;
>   }
>   
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_cpu_context *guest_ctxt;
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 6c7599b5cb40..fb91e728207b 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -58,7 +58,9 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>   
>   extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>   
> -extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
> +
> +extern int __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu);
>   
>   extern u64 __vgic_v3_get_ich_vtr_el2(void);
>   extern u64 __vgic_v3_read_vmcr(void);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 55ca2e3d42eb..accfe9a016f9 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -338,7 +338,63 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>   	return false;
>   }
>   
> -int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> +/* Switch to the guest for VHE systems running in EL2 */
> +int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *host_ctxt;
> +	struct kvm_cpu_context *guest_ctxt;
> +	u64 exit_code;
> +
> +	vcpu = kern_hyp_va(vcpu);
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	host_ctxt->__hyp_running_vcpu = vcpu;
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__sysreg_save_host_state(host_ctxt);
> +
> +	__activate_traps(vcpu);
> +	__activate_vm(vcpu);
> +
> +	__vgic_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
> +
> +	/*
> +	 * We must restore the 32-bit state before the sysregs, thanks
> +	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
> +	 */
> +	__sysreg32_restore_state(vcpu);
> +	__sysreg_restore_guest_state(guest_ctxt);
> +	__debug_switch_to_guest(vcpu);
> +
> +	do {
> +		/* Jump in the fire! */
> +		exit_code = __guest_enter(vcpu, host_ctxt);
> +
> +		/* And we're baaack! */
> +	} while (fixup_guest_exit(vcpu, &exit_code));
> +
> +	__sysreg_save_guest_state(guest_ctxt);
> +	__sysreg32_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +	__vgic_save_state(vcpu);
> +
> +	__deactivate_traps(vcpu);
> +	__deactivate_vm(vcpu);
> +
> +	__sysreg_restore_host_state(host_ctxt);
> +
> +	/*
> +	 * This must come after restoring the host sysregs, since a non-VHE
> +	 * system may enable SPE here and make use of the TTBRs.
> +	 */
> +	__debug_switch_to_host(vcpu);
> +
> +	return exit_code;
> +}
> +
> +/* Switch to the guest for legacy non-VHE systems */
> +int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_cpu_context *guest_ctxt;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5b1487bd91e8..6bce8f9c55db 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -733,7 +733,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   		trace_kvm_entry(*vcpu_pc(vcpu));
>   		guest_enter_irqoff();
>   
> -		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> +		if (has_vhe())
> +			ret = kvm_vcpu_run_vhe(vcpu);
> +		else
> +			ret = kvm_call_hyp(__kvm_vcpu_run_nvhe, vcpu);
>   
>   		vcpu->mode = OUTSIDE_GUEST_MODE;
>   		vcpu->stat.exits++;
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 17:53     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 17:53 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> The VHE switch function calls __timer_enable_traps and
> __timer_disable_traps which don't do anything on VHE systems.
> Therefore, simply remove these calls from the VHE switch function and
> make the functions non-conditional as they are now only called from the
> non-VHE switch path.
> 
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/switch.c |  2 --
>   virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
>   2 files changed, 22 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9aadef6966bf..6175fcb33ed2 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   	__activate_vm(vcpu->kvm);
>   
>   	__vgic_restore_state(vcpu);
> -	__timer_enable_traps(vcpu);
>   
>   	/*
>   	 * We must restore the 32-bit state before the sysregs, thanks
> @@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   
>   	__sysreg_save_guest_state(guest_ctxt);
>   	__sysreg32_save_state(vcpu);
> -	__timer_disable_traps(vcpu);
>   	__vgic_save_state(vcpu);
>   
>   	__deactivate_traps(vcpu);
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index f24404b3c8df..77754a62eb0c 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>   	write_sysreg(cntvoff, cntvoff_el2);
>   }
>   
> +/*
> + * Should only be called on non-VHE systems.
> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> + */
>   void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)

Would it be worth to suffix the function with nvhe? So it would be clear 
that it should not be called for VHE system?

>   {
> -	/*
> -	 * We don't need to do this for VHE since the host kernel runs in EL2
> -	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> -	 */
> -	if (!has_vhe()) {
> -		u64 val;
> +	u64 val;
>   
> -		/* Allow physical timer/counter access for the host */
> -		val = read_sysreg(cnthctl_el2);
> -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> +	/* Allow physical timer/counter access for the host */
> +	val = read_sysreg(cnthctl_el2);
> +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> +	write_sysreg(val, cnthctl_el2);
>   }
>   
> +/*
> + * Should only be called on non-VHE systems.
> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> + */
>   void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)

Same here.

>   {
> -	if (!has_vhe()) {
> -		u64 val;
> +	u64 val;
>   
> -		/*
> -		 * Disallow physical timer access for the guest
> -		 * Physical counter access is allowed
> -		 */
> -		val = read_sysreg(cnthctl_el2);
> -		val &= ~CNTHCTL_EL1PCEN;
> -		val |= CNTHCTL_EL1PCTEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> +	/*
> +	 * Disallow physical timer access for the guest
> +	 * Physical counter access is allowed
> +	 */
> +	val = read_sysreg(cnthctl_el2);
> +	val &= ~CNTHCTL_EL1PCEN;
> +	val |= CNTHCTL_EL1PCTEN;
> +	write_sysreg(val, cnthctl_el2);
>   }
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
@ 2018-02-09 17:53     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 17:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> The VHE switch function calls __timer_enable_traps and
> __timer_disable_traps which don't do anything on VHE systems.
> Therefore, simply remove these calls from the VHE switch function and
> make the functions non-conditional as they are now only called from the
> non-VHE switch path.
> 
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/switch.c |  2 --
>   virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
>   2 files changed, 22 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 9aadef6966bf..6175fcb33ed2 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   	__activate_vm(vcpu->kvm);
>   
>   	__vgic_restore_state(vcpu);
> -	__timer_enable_traps(vcpu);
>   
>   	/*
>   	 * We must restore the 32-bit state before the sysregs, thanks
> @@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   
>   	__sysreg_save_guest_state(guest_ctxt);
>   	__sysreg32_save_state(vcpu);
> -	__timer_disable_traps(vcpu);
>   	__vgic_save_state(vcpu);
>   
>   	__deactivate_traps(vcpu);
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index f24404b3c8df..77754a62eb0c 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>   	write_sysreg(cntvoff, cntvoff_el2);
>   }
>   
> +/*
> + * Should only be called on non-VHE systems.
> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> + */
>   void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)

Would it be worth to suffix the function with nvhe? So it would be clear 
that it should not be called for VHE system?

>   {
> -	/*
> -	 * We don't need to do this for VHE since the host kernel runs in EL2
> -	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> -	 */
> -	if (!has_vhe()) {
> -		u64 val;
> +	u64 val;
>   
> -		/* Allow physical timer/counter access for the host */
> -		val = read_sysreg(cnthctl_el2);
> -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> +	/* Allow physical timer/counter access for the host */
> +	val = read_sysreg(cnthctl_el2);
> +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> +	write_sysreg(val, cnthctl_el2);
>   }
>   
> +/*
> + * Should only be called on non-VHE systems.
> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> + */
>   void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)

Same here.

>   {
> -	if (!has_vhe()) {
> -		u64 val;
> +	u64 val;
>   
> -		/*
> -		 * Disallow physical timer access for the guest
> -		 * Physical counter access is allowed
> -		 */
> -		val = read_sysreg(cnthctl_el2);
> -		val &= ~CNTHCTL_EL1PCEN;
> -		val |= CNTHCTL_EL1PCTEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> +	/*
> +	 * Disallow physical timer access for the guest
> +	 * Physical counter access is allowed
> +	 */
> +	val = read_sysreg(cnthctl_el2);
> +	val &= ~CNTHCTL_EL1PCEN;
> +	val |= CNTHCTL_EL1PCTEN;
> +	write_sysreg(val, cnthctl_el2);
>   }
> 

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
  2018-01-12 12:07   ` Christoffer Dall
@ 2018-02-09 18:50     ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 18:50 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, Shih-Wei Li, kvm

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> There's a semantic difference between the EL1 registers that control
> operation of a kernel running in EL1 and EL1 registers that only control
> userspace execution in EL0.  Since we can defer saving/restoring the
> latter, move them into their own function.
> 
> We also take this chance to rename the function saving/restoring the
> remaining system register to make it clear this function deals with
> the EL1 system registers.
> 
> No functional change.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
>   1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 848a46eb33bf..99dd50ce483b 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
>   
>   static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
>   {
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);

I am a bit confused, the comment on top of the function says the host 
must save ACTLR_EL1 in the VHE case. But AFAICT, after this patch the 
register will not get saved in the host context. Did I miss anything?

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
@ 2018-02-09 18:50     ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-09 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> There's a semantic difference between the EL1 registers that control
> operation of a kernel running in EL1 and EL1 registers that only control
> userspace execution in EL0.  Since we can defer saving/restoring the
> latter, move them into their own function.
> 
> We also take this chance to rename the function saving/restoring the
> remaining system register to make it clear this function deals with
> the EL1 system registers.
> 
> No functional change.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>   arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
>   1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 848a46eb33bf..99dd50ce483b 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
>   
>   static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
>   {
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);

I am a bit confused, the comment on top of the function says the host 
must save ACTLR_EL1 in the VHE case. But AFAICT, after this patch the 
register will not get saved in the host context. Did I miss anything?

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-09 15:59             ` Dave Martin
@ 2018-02-13  8:51               ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:51 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, kvm, kvmarm, Shih-Wei Li, linux-arm-kernel

On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > > state back to the host state.
> > > > > > 
> > > > > > We still initially configure the VFP registers to trap when entering the
> > > > > > VM, but the difference is that we now leave the guest state in the
> > > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > > occasionally trap to the host, and we only restore the host state when
> > > > > > we return to user space or when scheduling another thread.
> > > > > > 
> > > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > > >   */
> > > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > > >  {
> > > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > > +
> > > > > > +	/* Restore host FP/SIMD state */
> > > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > > +		}
> > > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > > 
> > > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > > in, we probably don't need to restore it here.
> > > > > 
> > > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > > it's only done for SVE, since KVM was previously restoring the host
> > > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > > 
> > > > > For a returning run ioctl, this would have the effect of deferring the
> > > > > host FPSIMD reload until we return to userspace, which is probably
> > > > > no more costly since the kernel must check whether to do this in
> > > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > > other thread we save the cost of restoring the host state entirely here
> > > > > ... I think.
> > > > 
> > > > Yes, I agree.  However, currently the low-level logic in
> > > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > > that simply marking the state as invalid would cause the kernel to
> > > > restore some potentially stale values when returning to userspace.  Am I
> > > > missing something?
> > > 
> > > I think my point was that there would be no need for the low-level
> > > save of the host fpsimd state currently done by hyp.  At all.  The
> > > state would already have been saved off to thread_struct before
> > > entering the guest.
> > 
> > Ah, so if userspace touched any FPSIMD state, then we always save that
> > state when entering the kernel, even if we're just going to return to
> > the same userspace process anyway?  (For any system call etc.?)
> 
> Not exactly.  The state is saved when the corresponding user task is
> scheduled out, or when it would become stale because we're about to
> modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
> The state is also saved if necessary by kernel_neon_begin().
> 
> Simply entering the kernel and returning to userspace doesn't have
> this effect by itself.
> 
> 
> Prior to the SVE patches, KVM makes itself orthogonal to the host
> context switch machinery by ensuring that whatever the host had
> in the FPSIMD regs at guest entry is restored before returning to
> the host. (IIUC)  

Only if the guest actually touches FPSIMD state.  If the guest doesn't
touch FPSIMD (no trap to EL2), then we never touch the state at all.

> This means that redundant save/restore work is
> done by KVM, but does have the advantage of simplicity.

I don't understand what the redundant part here is?  Isn't it only
redundant in the case where the host (for some reason) has already saved
its FPSIMD state?  I assume that won't be the common case, since
"userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
explained above.

> 
> This breaks for SVE though: the high bits of the Z-registers will be
> zeroed as a side effect of the FPSIMD save/restore done by KVM.
> This means that if the host has state in those bits then it must
> be saved before entring the guest: that's what the new
> kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.

Again, I'm confused, because to me it looks like
kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
which just sets a pointer to NULL, but doesn't actually save the state.

So, when is the state in the hardware registers saved to memory?

> The alternative would have been for KVM to save/restore the host SVE
> state directly, but this seemed premature and invasive in the absence
> of full KVM SVE support.
> 
> This means that KVM's own save/restore of the host's FPSIMD state
> becomes redundant in this case, but since there is no SVE hardware
> yet, I favoured correctness over optimal performance here.
> 

I agree with the approach, I guess I just can't seem to follow the code
correctly...

> 
> My point here was that we could modify this hook to always save off the
> host FPSIMD state unconditionally before entering the guts of KVM,
> instead of only doing it when there is live SVE state.  The benefit of
> this is that the host context switch machinery knows if the state has
> already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> will only save the host FPSIMD (or SVE) state once, and won't restore
> it at all (assuming no context switches).
> 
> Instead, the user thread's FPSIMD state is only reloaded on the final
> return to userspace.
> 

I think that would invert the logic we have now, so instead of only
saving/restoring the FPSIMD state when the guest uses it (as we do now),
we would only save/restore the FPSIMD state when the host uses it,
regardless of what the guest does.

Ideally, we could have a combination of both, but it's unclear to me if
we have good indications that one case is more likely than the other.

My gut feeling though, is that the guest will be likely to often access
FPSIMD state for as long as we're in KVM_RUN, and that host userspace
also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
(kernel threads etc.) is unlikely to use FPSIMD for a system that is
primarily running VMs.

> > > 
> > > This would result in a redundant save, but only when the host fpsimd
> > > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > > back to the host.
> > > 
> > > For the host, the fpsimd state is only dirty after entering the kernel
> > > from userspace (or after certain other things like sigreturn or ptrace).
> > > So this approach would still avoid repeated save/restore when cycling
> > > between the guest and the kvm code in the host.
> > > 
> > 
> > I see.
> > 
> > > > It might very well be possible to change the logic so that we store the
> > > > host logic the same place where task_fpsimd_save() would have, and I
> > > > think that would make what you suggest possible.
> > > 
> > > That's certainly possible, but I viewed that as harder.  It would be
> > > necessary to map the host thread_struct into hyp etc. etc.
> > > 
> > 
> > And even then, unnecessary because it would duplicate the existing state
> > save, IIUC above.
> 
> Agreed.  The main disadvantage of my approach is that the host state
> cannot be saved lazily any more, but it least it is only saved once
> for a given vcpu run loop.
> 
> > > > I'd like to make that a separate change from this patch though, as we're
> > > > already changing quite a bit with this series, so I'm trying to make any
> > > > logical change as contained per patch as possible, so that problems can
> > > > be spotted by bisecting.
> > > 
> > > Yes, I think that's wise.
> > > 
> > 
> > ok, I'll try to incorporate this as a separate patch for the next
> > revision.
> > 
> > > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > > and then reentering the guest doesn't cause any reload at all.  
> > > > 
> > > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > > when you run another thread (preemptively or voluntarily), or when you
> > > > return to user space, but making the vcpu fpsimd context a first-class
> > > > citizen fpsimd context would mean that you can run another thread (and
> > > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > > save/restore anything.  Am I getting this right?
> > > 
> > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > restored at that point: there is no laziness there -- it _could_
> > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > fact that the compiler can and does generate FPSIMD code quite
> > > liberally by default).
> > > 
> > > For the case of being preempted within the kernel with no ret_to_user,
> > > you are correct.
> > > 
> > 
> > ok, that would indeed also be useful for things like switching to a
> > vhost thread and returning to the vcpu thread.
> 
> What's a vhost thread?
> 

vhost is the offload mechanism for the data-path of virtio devices, and
for example if you have an application in your VM which is sending data,
the VM will typically fill some buffer and then cause a trap to the host
kernel by writing to an emulated PCI MMIO register, and that trap is
handled by KVM's IO bus infrastructure, which schedules a vhost kernel
thread which actually sends the data from the buffer shared between the
VM and the host kernel onto the physical network.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-13  8:51               ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> > > On Thu, Jan 25, 2018 at 08:46:53PM +0100, Christoffer Dall wrote:
> > > > On Mon, Jan 22, 2018 at 05:33:28PM +0000, Dave Martin wrote:
> > > > > On Fri, Jan 12, 2018 at 01:07:15PM +0100, Christoffer Dall wrote:
> > > > > > Avoid saving the guest VFP registers and restoring the host VFP
> > > > > > registers on every exit from the VM.  Only when we're about to run
> > > > > > userspace or other threads in the kernel do we really have to switch the
> > > > > > state back to the host state.
> > > > > > 
> > > > > > We still initially configure the VFP registers to trap when entering the
> > > > > > VM, but the difference is that we now leave the guest state in the
> > > > > > hardware registers as long as we're running this VCPU, even if we
> > > > > > occasionally trap to the host, and we only restore the host state when
> > > > > > we return to user space or when scheduling another thread.
> > > > > > 
> > > > > > Reviewed-by: Andrew Jones <drjones@redhat.com>
> > > > > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > index 883a6383cd36..848a46eb33bf 100644
> > > > > > --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > > +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > @@ -213,6 +215,19 @@ void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu)
> > > > > >   */
> > > > > >  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
> > > > > >  {
> > > > > > +	struct kvm_cpu_context *host_ctxt = vcpu->arch.host_cpu_context;
> > > > > > +	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
> > > > > > +
> > > > > > +	/* Restore host FP/SIMD state */
> > > > > > +	if (vcpu->arch.guest_vfp_loaded) {
> > > > > > +		if (vcpu_el1_is_32bit(vcpu)) {
> > > > > > +			kvm_call_hyp(__fpsimd32_save_state,
> > > > > > +				     kern_hyp_va(guest_ctxt));
> > > > > > +		}
> > > > > > +		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> > > > > > +		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> > > > > > +		vcpu->arch.guest_vfp_loaded = 0;
> > > > > 
> > > > > Provided we've already marked the host FPSIMD state as dirty on the way
> > > > > in, we probably don't need to restore it here.
> > > > > 
> > > > > In v4.15, the kvm_fpsimd_flush_cpu_state() call in
> > > > > kvm_arch_vcpu_ioctl_run() is supposed to do this marking: currently
> > > > > it's only done for SVE, since KVM was previously restoring the host
> > > > > FPSIMD subset of the state anyway, but it could be made unconditional.
> > > > > 
> > > > > For a returning run ioctl, this would have the effect of deferring the
> > > > > host FPSIMD reload until we return to userspace, which is probably
> > > > > no more costly since the kernel must check whether to do this in
> > > > > ret_to_user anyway; OTOH if the vcpu thread was preempted by some
> > > > > other thread we save the cost of restoring the host state entirely here
> > > > > ... I think.
> > > > 
> > > > Yes, I agree.  However, currently the low-level logic in
> > > > arch/arm64/kvm/hyp/entry.S:__fpsimd_guest_restore which saves the host
> > > > state into vcpu->arch.host_cpu_context->gp_regs.fp_regs (where
> > > > host_cpu_context is a KVM-specific per-cpu variable).  I think means
> > > > that simply marking the state as invalid would cause the kernel to
> > > > restore some potentially stale values when returning to userspace.  Am I
> > > > missing something?
> > > 
> > > I think my point was that there would be no need for the low-level
> > > save of the host fpsimd state currently done by hyp.  At all.  The
> > > state would already have been saved off to thread_struct before
> > > entering the guest.
> > 
> > Ah, so if userspace touched any FPSIMD state, then we always save that
> > state when entering the kernel, even if we're just going to return to
> > the same userspace process anyway?  (For any system call etc.?)
> 
> Not exactly.  The state is saved when the corresponding user task is
> scheduled out, or when it would become stale because we're about to
> modify the task_struct view of the state (sigreturn/PTRACE_SETREGST).
> The state is also saved if necessary by kernel_neon_begin().
> 
> Simply entering the kernel and returning to userspace doesn't have
> this effect by itself.
> 
> 
> Prior to the SVE patches, KVM makes itself orthogonal to the host
> context switch machinery by ensuring that whatever the host had
> in the FPSIMD regs at guest entry is restored before returning to
> the host. (IIUC)  

Only if the guest actually touches FPSIMD state.  If the guest doesn't
touch FPSIMD (no trap to EL2), then we never touch the state at all.

> This means that redundant save/restore work is
> done by KVM, but does have the advantage of simplicity.

I don't understand what the redundant part here is?  Isn't it only
redundant in the case where the host (for some reason) has already saved
its FPSIMD state?  I assume that won't be the common case, since
"userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
explained above.

> 
> This breaks for SVE though: the high bits of the Z-registers will be
> zeroed as a side effect of the FPSIMD save/restore done by KVM.
> This means that if the host has state in those bits then it must
> be saved before entring the guest: that's what the new
> kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.

Again, I'm confused, because to me it looks like
kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
which just sets a pointer to NULL, but doesn't actually save the state.

So, when is the state in the hardware registers saved to memory?

> The alternative would have been for KVM to save/restore the host SVE
> state directly, but this seemed premature and invasive in the absence
> of full KVM SVE support.
> 
> This means that KVM's own save/restore of the host's FPSIMD state
> becomes redundant in this case, but since there is no SVE hardware
> yet, I favoured correctness over optimal performance here.
> 

I agree with the approach, I guess I just can't seem to follow the code
correctly...

> 
> My point here was that we could modify this hook to always save off the
> host FPSIMD state unconditionally before entering the guts of KVM,
> instead of only doing it when there is live SVE state.  The benefit of
> this is that the host context switch machinery knows if the state has
> already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> will only save the host FPSIMD (or SVE) state once, and won't restore
> it at all (assuming no context switches).
> 
> Instead, the user thread's FPSIMD state is only reloaded on the final
> return to userspace.
> 

I think that would invert the logic we have now, so instead of only
saving/restoring the FPSIMD state when the guest uses it (as we do now),
we would only save/restore the FPSIMD state when the host uses it,
regardless of what the guest does.

Ideally, we could have a combination of both, but it's unclear to me if
we have good indications that one case is more likely than the other.

My gut feeling though, is that the guest will be likely to often access
FPSIMD state for as long as we're in KVM_RUN, and that host userspace
also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
(kernel threads etc.) is unlikely to use FPSIMD for a system that is
primarily running VMs.

> > > 
> > > This would result in a redundant save, but only when the host fpsimd
> > > state is dirty and the guest vcpu doesn't touch fpsimd before trapping
> > > back to the host.
> > > 
> > > For the host, the fpsimd state is only dirty after entering the kernel
> > > from userspace (or after certain other things like sigreturn or ptrace).
> > > So this approach would still avoid repeated save/restore when cycling
> > > between the guest and the kvm code in the host.
> > > 
> > 
> > I see.
> > 
> > > > It might very well be possible to change the logic so that we store the
> > > > host logic the same place where task_fpsimd_save() would have, and I
> > > > think that would make what you suggest possible.
> > > 
> > > That's certainly possible, but I viewed that as harder.  It would be
> > > necessary to map the host thread_struct into hyp etc. etc.
> > > 
> > 
> > And even then, unnecessary because it would duplicate the existing state
> > save, IIUC above.
> 
> Agreed.  The main disadvantage of my approach is that the host state
> cannot be saved lazily any more, but it least it is only saved once
> for a given vcpu run loop.
> 
> > > > I'd like to make that a separate change from this patch though, as we're
> > > > already changing quite a bit with this series, so I'm trying to make any
> > > > logical change as contained per patch as possible, so that problems can
> > > > be spotted by bisecting.
> > > 
> > > Yes, I think that's wise.
> > > 
> > 
> > ok, I'll try to incorporate this as a separate patch for the next
> > revision.
> > 
> > > > > Ultimately I'd like to go one better and actually treat a vcpu as a
> > > > > first-class fpsimd context, so that taking an interrupt to the host
> > > > > and then reentering the guest doesn't cause any reload at all.  
> > > > 
> > > > That should be the case already; kvm_vcpu_put_sysregs() is only called
> > > > when you run another thread (preemptively or voluntarily), or when you
> > > > return to user space, but making the vcpu fpsimd context a first-class
> > > > citizen fpsimd context would mean that you can run another thread (and
> > > > maybe run userspace if it doesn't use fpsimd?) without having to
> > > > save/restore anything.  Am I getting this right?
> > > 
> > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > restored at that point: there is no laziness there -- it _could_
> > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > fact that the compiler can and does generate FPSIMD code quite
> > > liberally by default).
> > > 
> > > For the case of being preempted within the kernel with no ret_to_user,
> > > you are correct.
> > > 
> > 
> > ok, that would indeed also be useful for things like switching to a
> > vhost thread and returning to the vcpu thread.
> 
> What's a vhost thread?
> 

vhost is the offload mechanism for the data-path of virtio devices, and
for example if you have an application in your VM which is sending data,
the VM will typically fill some buffer and then cause a trap to the host
kernel by writing to an emulated PCI MMIO register, and that trap is
handled by KVM's IO bus infrastructure, which schedules a vhost kernel
thread which actually sends the data from the buffer shared between the
VM and the host kernel onto the physical network.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-09 15:26     ` Julien Grall
@ 2018-02-13  8:52       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:52 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 03:26:59PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >Avoid saving the guest VFP registers and restoring the host VFP
> >registers on every exit from the VM.  Only when we're about to run
> >userspace or other threads in the kernel do we really have to switch the
> 
> s/do// ?
> 
> >state back to the host state.
> >
> >We still initially configure the VFP registers to trap when entering the
> >VM, but the difference is that we now leave the guest state in the
> >hardware registers as long as we're running this VCPU, even if we
> >occasionally trap to the host, and we only restore the host state when
> >we return to user space or when scheduling another thread.
> >
> >Reviewed-by: Andrew Jones <drjones@redhat.com>
> >Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kernel/asm-offsets.c   |  1 +
> >  arch/arm64/kvm/hyp/entry.S        |  3 +++
> >  arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
> >  arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
> >  5 files changed, 40 insertions(+), 36 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >index 0e9e7291a7e6..9e23bc968668 100644
> >--- a/arch/arm64/include/asm/kvm_host.h
> >+++ b/arch/arm64/include/asm/kvm_host.h
> >@@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
> >  	/* Guest debug state */
> >  	u64 debug_flags;
> >+	/* 1 if the guest VFP state is loaded into the hardware */
> >+	u8 guest_vfp_loaded;
> >+
> >  	/*
> >  	 * We maintain more than a single set of debug registers to support
> >  	 * debugging the guest from the host and to maintain separate host and
> >diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >index 612021dce84f..99467327c043 100644
> >--- a/arch/arm64/kernel/asm-offsets.c
> >+++ b/arch/arm64/kernel/asm-offsets.c
> >@@ -133,6 +133,7 @@ int main(void)
> >    DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
> >    DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
> >    DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> >+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
> >    DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
> >    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >    DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> >diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >index a360ac6e89e9..53652287a236 100644
> >--- a/arch/arm64/kvm/hyp/entry.S
> >+++ b/arch/arm64/kvm/hyp/entry.S
> >@@ -184,6 +184,9 @@ alternative_endif
> >  	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >  	bl	__fpsimd_restore_state
> >+	mov	x0, #1
> >+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> >+
> >  	// Skip restoring fpexc32 for AArch64 guests
> >  	mrs	x1, hcr_el2
> >  	tbnz	x1, #HCR_RW_SHIFT, 1f
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 12dc647a6e5f..29e44a20f5e3 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -24,43 +24,32 @@
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> >-static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >-{
> >-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> >-}
> >-
> >-static bool __hyp_text __fpsimd_enabled_vhe(void)
> >-{
> >-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> >-}
> >-
> >-static hyp_alternate_select(__fpsimd_is_enabled,
> >-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> >-			    ARM64_HAS_VIRT_HOST_EXTN);
> >-
> >-bool __hyp_text __fpsimd_enabled(void)
> 
> Now that __fpsimd_enabled is removed, I think you need to remove the
> prototype in arch/arm64/include/kvm_hyp.h too.
> 
Will do.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-13  8:52       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 03:26:59PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >Avoid saving the guest VFP registers and restoring the host VFP
> >registers on every exit from the VM.  Only when we're about to run
> >userspace or other threads in the kernel do we really have to switch the
> 
> s/do// ?
> 
> >state back to the host state.
> >
> >We still initially configure the VFP registers to trap when entering the
> >VM, but the difference is that we now leave the guest state in the
> >hardware registers as long as we're running this VCPU, even if we
> >occasionally trap to the host, and we only restore the host state when
> >we return to user space or when scheduling another thread.
> >
> >Reviewed-by: Andrew Jones <drjones@redhat.com>
> >Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kernel/asm-offsets.c   |  1 +
> >  arch/arm64/kvm/hyp/entry.S        |  3 +++
> >  arch/arm64/kvm/hyp/switch.c       | 48 ++++++++++++---------------------------
> >  arch/arm64/kvm/hyp/sysreg-sr.c    | 21 ++++++++++++++---
> >  5 files changed, 40 insertions(+), 36 deletions(-)
> >
> >diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >index 0e9e7291a7e6..9e23bc968668 100644
> >--- a/arch/arm64/include/asm/kvm_host.h
> >+++ b/arch/arm64/include/asm/kvm_host.h
> >@@ -213,6 +213,9 @@ struct kvm_vcpu_arch {
> >  	/* Guest debug state */
> >  	u64 debug_flags;
> >+	/* 1 if the guest VFP state is loaded into the hardware */
> >+	u8 guest_vfp_loaded;
> >+
> >  	/*
> >  	 * We maintain more than a single set of debug registers to support
> >  	 * debugging the guest from the host and to maintain separate host and
> >diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> >index 612021dce84f..99467327c043 100644
> >--- a/arch/arm64/kernel/asm-offsets.c
> >+++ b/arch/arm64/kernel/asm-offsets.c
> >@@ -133,6 +133,7 @@ int main(void)
> >    DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
> >    DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
> >    DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
> >+  DEFINE(VCPU_GUEST_VFP_LOADED,	offsetof(struct kvm_vcpu, arch.guest_vfp_loaded));
> >    DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
> >    DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
> >    DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
> >diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >index a360ac6e89e9..53652287a236 100644
> >--- a/arch/arm64/kvm/hyp/entry.S
> >+++ b/arch/arm64/kvm/hyp/entry.S
> >@@ -184,6 +184,9 @@ alternative_endif
> >  	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >  	bl	__fpsimd_restore_state
> >+	mov	x0, #1
> >+	strb	w0, [x3, #VCPU_GUEST_VFP_LOADED]
> >+
> >  	// Skip restoring fpexc32 for AArch64 guests
> >  	mrs	x1, hcr_el2
> >  	tbnz	x1, #HCR_RW_SHIFT, 1f
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 12dc647a6e5f..29e44a20f5e3 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -24,43 +24,32 @@
> >  #include <asm/fpsimd.h>
> >  #include <asm/debug-monitors.h>
> >-static bool __hyp_text __fpsimd_enabled_nvhe(void)
> >-{
> >-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> >-}
> >-
> >-static bool __hyp_text __fpsimd_enabled_vhe(void)
> >-{
> >-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> >-}
> >-
> >-static hyp_alternate_select(__fpsimd_is_enabled,
> >-			    __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe,
> >-			    ARM64_HAS_VIRT_HOST_EXTN);
> >-
> >-bool __hyp_text __fpsimd_enabled(void)
> 
> Now that __fpsimd_enabled is removed, I think you need to remove the
> prototype in arch/arm64/include/kvm_hyp.h too.
> 
Will do.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
  2018-02-09 17:34     ` Julien Grall
@ 2018-02-13  8:52       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:52 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 05:34:05PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >So far this is just a copy of the legacy non-VHE switch function, but we
> >will start reworking these functions in separate directions to work on
> >VHE and non-VHE in the most optimal way in later patches.
> >
> >Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm/include/asm/kvm_asm.h   |  5 +++-
> >  arch/arm/kvm/hyp/switch.c        |  2 +-
> >  arch/arm64/include/asm/kvm_asm.h |  4 ++-
> >  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
> >  virt/kvm/arm/arm.c               |  5 +++-
> >  5 files changed, 69 insertions(+), 5 deletions(-)
> >
> >diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> >index 36dd2962a42d..4ac717276543 100644
> >--- a/arch/arm/include/asm/kvm_asm.h
> >+++ b/arch/arm/include/asm/kvm_asm.h
> >@@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >+/* no VHE on 32-bit :( */
> >+static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> 
> Should we return an error or add a BUG() to catch potential use of this
> function?
> 

That definitely can't hurt.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run
@ 2018-02-13  8:52       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 05:34:05PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >So far this is just a copy of the legacy non-VHE switch function, but we
> >will start reworking these functions in separate directions to work on
> >VHE and non-VHE in the most optimal way in later patches.
> >
> >Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm/include/asm/kvm_asm.h   |  5 +++-
> >  arch/arm/kvm/hyp/switch.c        |  2 +-
> >  arch/arm64/include/asm/kvm_asm.h |  4 ++-
> >  arch/arm64/kvm/hyp/switch.c      | 58 +++++++++++++++++++++++++++++++++++++++-
> >  virt/kvm/arm/arm.c               |  5 +++-
> >  5 files changed, 69 insertions(+), 5 deletions(-)
> >
> >diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> >index 36dd2962a42d..4ac717276543 100644
> >--- a/arch/arm/include/asm/kvm_asm.h
> >+++ b/arch/arm/include/asm/kvm_asm.h
> >@@ -70,7 +70,10 @@ extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> >-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >+/* no VHE on 32-bit :( */
> >+static inline int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { return 0; }
> 
> Should we return an error or add a BUG() to catch potential use of this
> function?
> 

That definitely can't hurt.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  2018-02-09 17:53     ` Julien Grall
@ 2018-02-13  8:53       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:53 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >The VHE switch function calls __timer_enable_traps and
> >__timer_disable_traps which don't do anything on VHE systems.
> >Therefore, simply remove these calls from the VHE switch function and
> >make the functions non-conditional as they are now only called from the
> >non-VHE switch path.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c |  2 --
> >  virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
> >  2 files changed, 22 insertions(+), 24 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 9aadef6966bf..6175fcb33ed2 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu->kvm);
> >  	__vgic_restore_state(vcpu);
> >-	__timer_enable_traps(vcpu);
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> >@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> >-	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  	__deactivate_traps(vcpu);
> >diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> >index f24404b3c8df..77754a62eb0c 100644
> >--- a/virt/kvm/arm/hyp/timer-sr.c
> >+++ b/virt/kvm/arm/hyp/timer-sr.c
> >@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> 
> Would it be worth to suffix the function with nvhe? So it would be clear
> that it should not be called for VHE system?
> 
> >  {
> >-	/*
> >-	 * We don't need to do this for VHE since the host kernel runs in EL2
> >-	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >-	 */
> >-	if (!has_vhe()) {
> >-		u64 val;
> >+	u64 val;
> >-		/* Allow physical timer/counter access for the host */
> >-		val = read_sysreg(cnthctl_el2);
> >-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> >-		write_sysreg(val, cnthctl_el2);
> >-	}
> >+	/* Allow physical timer/counter access for the host */
> >+	val = read_sysreg(cnthctl_el2);
> >+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> >+	write_sysreg(val, cnthctl_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> 
> Same here.
> 
I'll have a look.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
@ 2018-02-13  8:53       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >The VHE switch function calls __timer_enable_traps and
> >__timer_disable_traps which don't do anything on VHE systems.
> >Therefore, simply remove these calls from the VHE switch function and
> >make the functions non-conditional as they are now only called from the
> >non-VHE switch path.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c |  2 --
> >  virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
> >  2 files changed, 22 insertions(+), 24 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 9aadef6966bf..6175fcb33ed2 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu->kvm);
> >  	__vgic_restore_state(vcpu);
> >-	__timer_enable_traps(vcpu);
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> >@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> >-	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  	__deactivate_traps(vcpu);
> >diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> >index f24404b3c8df..77754a62eb0c 100644
> >--- a/virt/kvm/arm/hyp/timer-sr.c
> >+++ b/virt/kvm/arm/hyp/timer-sr.c
> >@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> 
> Would it be worth to suffix the function with nvhe? So it would be clear
> that it should not be called for VHE system?
> 
> >  {
> >-	/*
> >-	 * We don't need to do this for VHE since the host kernel runs in EL2
> >-	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >-	 */
> >-	if (!has_vhe()) {
> >-		u64 val;
> >+	u64 val;
> >-		/* Allow physical timer/counter access for the host */
> >-		val = read_sysreg(cnthctl_el2);
> >-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> >-		write_sysreg(val, cnthctl_el2);
> >-	}
> >+	/* Allow physical timer/counter access for the host */
> >+	val = read_sysreg(cnthctl_el2);
> >+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> >+	write_sysreg(val, cnthctl_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> 
> Same here.
> 
I'll have a look.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-02-09 16:17         ` Dave Martin
@ 2018-02-13  8:55           ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:55 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

On Fri, Feb 09, 2018 at 04:17:39PM +0000, Dave Martin wrote:
> On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> > On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> > > On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > > > We are about to defer saving and restoring some groups of system
> > > > registers to vcpu_put and vcpu_load on supported systems.  This means
> > > > that we need some infrastructure to access system registes which
> > > > supports either accessing the memory backing of the register or directly
> > > > accessing the system registers, depending on the state of the system
> > > > when we access the register.
> > > > 
> > > > We do this by defining a set of read/write accessors for each system
> > > > register, and letting each system register be defined as "immediate" or
> > > > "deferrable".  Immediate registers are always saved/restored in the
> > > > world-switch path, but deferrable registers are only saved/restored in
> > > > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > > > in that case.
> > > > 
> > > > Not that we don't use the deferred mechanism yet in this patch, but only
> > > > introduce infrastructure.  This is to improve convenience of review in
> > > > the subsequent patches where it is clear which registers become
> > > > deferred.
> > > 
> > > Might this table-driven approach result in a lot of branch mispredicts,
> > > particularly across load/put boundaries?
> > > 
> > > If we were to move the whole construct to a header, then it could get
> > > constant-folded at the call site down to the individual reg accessed,
> > > say:
> > > 
> > > 	if (sys_regs_loaded)
> > > 		read_sysreg_s(TPIDR_EL0);
> > > 	else
> > > 		__vcpu_sys_reg(v, TPIDR_EL0);
> > > 
> > > Where multiple regs are accessed close to each other, the compiler
> > > may be able to specialise the whole sequence for the loaded and !loaded
> > > cases so that there is only one conditional branch.
> > > 
> > 
> > That's an interesting thing to consider indeed.  I wasn't really sure
> > how to put this in a header file which wouldn't look overly bloated for
> > inclusion elsewhere, so we ended up with this.
> > 
> > I don't think the alternative suggestion that I discused with Julien on
> > this patch changes this much, but since you've had a look at this, I'm
> > curious which one of the two (lookup table vs. giant switch) you prefer?
> 
> The giant switch approach has the advantage that it is likely to be
> folded down to a single case when the switch control expression is
> const-foldable; the flipside is that when that fails the entire
> switch would be inlined.
> 
> > > The individual accessor functions also become unnecessary in this case,
> > > because we wouldn't need to derive function pointers from them any
> > > more.
> > > 
> > > I don't know how performance would compare in practice though.
> > 
> > I don't know either.  But I will say that the whole idea behind put/load
> > is that you do this rarely, and going to userspace from KVM is
> > notriously expensive, also on x86.
> 
> I guess that makes sense.  I'm still a bit hazy on the big picture
> for KVM.
> 
> > > I'm also assuming that all calls to these accessors are const-foldable.
> > > If not, relying on inlining would bloat the generated code a lot.
> > 
> > We have places where this is not the cae, access_vm_reg() for example.
> > But if we really, really, wanted to, we could rewrite that to have a
> > function for each register, but that's pretty horrid on its own.
> 
> That might not be too bad if there is only one giant inline expansion
> and the rest are folded down.
> 
> 
> I guess this is something to revisit _if_ we suspect a performance
> bottleneck later on.
> 
> For now, I was lacking some understanding regarding how this code gets
> run, so I was guessing about potential issues rather then proven
> issues.
> 

This was a very useful discussion.  I think I'll change this to a big
switch statement in the header file using a static inline, because it
makes the code more readable, and if we notice a huge code size
explosion, we can take measures to make sure things are const-foldable.


> As you might guess, I'm still at the "stupid questions" stage for
> this series :)
> 
Not at all.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-02-13  8:55           ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13  8:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 04:17:39PM +0000, Dave Martin wrote:
> On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> > On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:
> > > On Fri, Jan 12, 2018 at 01:07:32PM +0100, Christoffer Dall wrote:
> > > > We are about to defer saving and restoring some groups of system
> > > > registers to vcpu_put and vcpu_load on supported systems.  This means
> > > > that we need some infrastructure to access system registes which
> > > > supports either accessing the memory backing of the register or directly
> > > > accessing the system registers, depending on the state of the system
> > > > when we access the register.
> > > > 
> > > > We do this by defining a set of read/write accessors for each system
> > > > register, and letting each system register be defined as "immediate" or
> > > > "deferrable".  Immediate registers are always saved/restored in the
> > > > world-switch path, but deferrable registers are only saved/restored in
> > > > vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set
> > > > in that case.
> > > > 
> > > > Not that we don't use the deferred mechanism yet in this patch, but only
> > > > introduce infrastructure.  This is to improve convenience of review in
> > > > the subsequent patches where it is clear which registers become
> > > > deferred.
> > > 
> > > Might this table-driven approach result in a lot of branch mispredicts,
> > > particularly across load/put boundaries?
> > > 
> > > If we were to move the whole construct to a header, then it could get
> > > constant-folded at the call site down to the individual reg accessed,
> > > say:
> > > 
> > > 	if (sys_regs_loaded)
> > > 		read_sysreg_s(TPIDR_EL0);
> > > 	else
> > > 		__vcpu_sys_reg(v, TPIDR_EL0);
> > > 
> > > Where multiple regs are accessed close to each other, the compiler
> > > may be able to specialise the whole sequence for the loaded and !loaded
> > > cases so that there is only one conditional branch.
> > > 
> > 
> > That's an interesting thing to consider indeed.  I wasn't really sure
> > how to put this in a header file which wouldn't look overly bloated for
> > inclusion elsewhere, so we ended up with this.
> > 
> > I don't think the alternative suggestion that I discused with Julien on
> > this patch changes this much, but since you've had a look at this, I'm
> > curious which one of the two (lookup table vs. giant switch) you prefer?
> 
> The giant switch approach has the advantage that it is likely to be
> folded down to a single case when the switch control expression is
> const-foldable; the flipside is that when that fails the entire
> switch would be inlined.
> 
> > > The individual accessor functions also become unnecessary in this case,
> > > because we wouldn't need to derive function pointers from them any
> > > more.
> > > 
> > > I don't know how performance would compare in practice though.
> > 
> > I don't know either.  But I will say that the whole idea behind put/load
> > is that you do this rarely, and going to userspace from KVM is
> > notriously expensive, also on x86.
> 
> I guess that makes sense.  I'm still a bit hazy on the big picture
> for KVM.
> 
> > > I'm also assuming that all calls to these accessors are const-foldable.
> > > If not, relying on inlining would bloat the generated code a lot.
> > 
> > We have places where this is not the cae, access_vm_reg() for example.
> > But if we really, really, wanted to, we could rewrite that to have a
> > function for each register, but that's pretty horrid on its own.
> 
> That might not be too bad if there is only one giant inline expansion
> and the rest are folded down.
> 
> 
> I guess this is something to revisit _if_ we suspect a performance
> bottleneck later on.
> 
> For now, I was lacking some understanding regarding how this code gets
> run, so I was guessing about potential issues rather then proven
> issues.
> 

This was a very useful discussion.  I think I'll change this to a big
switch statement in the header file using a static inline, because it
makes the code more readable, and if we notice a huge code size
explosion, we can take measures to make sure things are const-foldable.


> As you might guess, I'm still at the "stupid questions" stage for
> this series :)
> 
Not at all.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-13  8:51               ` Christoffer Dall
@ 2018-02-13 14:08                 ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-13 14:08 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, kvm, linux-arm-kernel

On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:

[...]

> > Simply entering the kernel and returning to userspace doesn't have
> > this effect by itself.
> > 
> > 
> > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > context switch machinery by ensuring that whatever the host had
> > in the FPSIMD regs at guest entry is restored before returning to
> > the host. (IIUC)  
> 
> Only if the guest actually touches FPSIMD state.  If the guest doesn't
> touch FPSIMD (no trap to EL2), then we never touch the state at all.

I should have been clearer: KVM ensures that the state is _unchanged_
before returning to the host, but can elide the save/restore when the
guest doesn't touch the state...

> 
> > This means that redundant save/restore work is
> > done by KVM, but does have the advantage of simplicity.
> 
> I don't understand what the redundant part here is?  Isn't it only
> redundant in the case where the host (for some reason) has already saved
> its FPSIMD state?  I assume that won't be the common case, since
> "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> explained above.

...however, when this elision does not occur, it may duplicate
save/restore done by the kernel, or it may save/restore worthless data
if the host's FPSIMD state is non-live at the time.

It's hard to gauge the impact of this: it seems unlikely to make a
massive difference, but will be highly workload-dependent.


The redundancy occurs because of the deferred restore of the FPSIMD
registers for host userspace: as a result, the host FPSIMD regs are
either discardable (i.e., already saved) or not live at all between
and context switch and the next ret_to_user.

This means that if the vcpu run loop is preempted, then when the host
switches back to the run loop it is pointless to save or restore the
host FPSIMD state.

A typical sequence of events exposing this redundancy would be as
follows.  I assume here that there are two cpu-bound tasks A and B
competing for a host CPU, where A is a vcpu thread:

 - vcpu A is in the guest running a compute-heavy task
 - FPSIMD typically traps to the host before context switch
 X kvm saves the host FPSIMD state
 - kvm loads the guest FPSIMD state
 - vcpu A reenters the guest
 - host context switch IRQ preempts A back to the run loop
 Y kvm loads the host FPSIMD state via vcpu_put

 - host context switch:
 - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
 - switch to B
 - B reaches ret_to_user
 Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
 - B enters userspace

 - host context switch:
 - B enters kernel
 X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
 - switch to A -> set TIF_FOREIGN_FPSTATE for A
 - back to the KVM run loop

 - vcpu A enters guest
 - redo from start

Here, the two saves marked X are redundant with respect to each other,
and the two restores marked Y are redundant with respect to each other.

> > This breaks for SVE though: the high bits of the Z-registers will be
> > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > This means that if the host has state in those bits then it must
> > be saved before entring the guest: that's what the new
> > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> 
> Again, I'm confused, because to me it looks like
> kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> which just sets a pointer to NULL, but doesn't actually save the state.
> 
> So, when is the state in the hardware registers saved to memory?

This _is_ quite confusing: in writing this answer I identified a bug
and then realised why there is no bug...

kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
actually saved today because we explicitly don't care about preserving
the SVE state, because the syscall ABI throws the SVE regs away as
a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
ensures that the non-SVE FPSIMD bits _are_ restored by itself.

I think my proposal is that this hook might take on the role of
actually saving the state too, if we move that out of the KVM host
context save/restore code.

Perhaps we could even replace

	preempt_disable();
	kvm_fpsimd_flush_cpu_state();
	/* ... */
	preempt_enable();

with

	kernel_neon_begin();
	/* ... */
	kernel_neon_end();

which does have the host user context saving built in -- though this
may have unwanted side effects, such as masking softirqs.  Possibly not
a big deal though if the region is kept small (?)


<aside>

Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
not trivial...  the explanation goes something like this:

(*takes deep breath*)

A link is maintained between tasks and CPUs to indicate whether a
given CPU has the task's FPSIMD state in its regs.

For brevity, I'll describe this link as a relation loaded(task, cpu).

	loaded(current, smp_processor_id()) <->
		!test_thread_flag(TIF_FOREIGN_FPSTATE).

(In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)

For non-current tasks, the relation is something like

	loaded(task, cpu) <->
		&task->thread.fpsimd_state ==
			per_cpu(fpsimd_last_state.st, cpu) &&
		task->thread.fpsimd_state.cpu == cpu.

There are subtleties about when these equivalences are meaningful
and how they can be checked safely that I'll gloss over here --
to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
task_struct in KVM guest entry").

 * loaded(task, cpu) is made false for all cpus and a given task
   by fpsimd_flush_task_state(task).

   This is how we invalidate a stale copy of some task's state when
   the kernel deliberately changes the state (e.g., exec, sigreturn,
   PTRACE_SETREGSET).

 * loaded(task, smp_processor_id()) is made false for all tasks
   by fpsimd_flush_cpu_state().

   This is how we avoid using the FPSIMD regs of some CPU that
   the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
   of any task's FPSIMD state.

 * loaded(current, smp_processor_id()) is made true by
   fpsimd_bind_to_cpu().

   fpsimd_bind_to_cpu() also implies the effects of
   fpsimd_flush_task_state(current) and
   fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
   established.  This is not explicit in the code, but falls out from
   the way the relation is represented.


( There is a wrinkle here: fpsimd_flush_task_state(task) should always
be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
garbage left in the SVE bits by KVM's save/restore may spuriously
appear in the vcpu thread's user regs.  But since that data will be (a)
zeros or (b) the task's own data; and because TIF_SVE is cleared in
entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
think this matters in practice.

If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
case too then this becomes significant and we _would_ need to clear
TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the
vcpu user thread. )

</aside>

> > The alternative would have been for KVM to save/restore the host SVE
> > state directly, but this seemed premature and invasive in the absence
> > of full KVM SVE support.
> > 
> > This means that KVM's own save/restore of the host's FPSIMD state
> > becomes redundant in this case, but since there is no SVE hardware
> > yet, I favoured correctness over optimal performance here.
> > 
> 
> I agree with the approach, I guess I just can't seem to follow the code
> correctly...

Understandable... even trying to remember how it works is giving me a
headache #P

> 
> > 
> > My point here was that we could modify this hook to always save off the
> > host FPSIMD state unconditionally before entering the guts of KVM,
> > instead of only doing it when there is live SVE state.  The benefit of
> > this is that the host context switch machinery knows if the state has
> > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > will only save the host FPSIMD (or SVE) state once, and won't restore
> > it at all (assuming no context switches).
> > 
> > Instead, the user thread's FPSIMD state is only reloaded on the final
> > return to userspace.
> > 
> 
> I think that would invert the logic we have now, so instead of only
> saving/restoring the FPSIMD state when the guest uses it (as we do now),
> we would only save/restore the FPSIMD state when the host uses it,
> regardless of what the guest does.

I'm not sure that's a complete characterisation of what's going on,
but I'm struggling to describe my view in simple terms.

> Ideally, we could have a combination of both, but it's unclear to me if
> we have good indications that one case is more likely than the other.
> 
> My gut feeling though, is that the guest will be likely to often access
> FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> primarily running VMs.

I think my suggestion:

 * neither adds nor elides any saves or restores of the guest context;
 * elides some saves and restores of the host context;
 * moves the host context save with respect to your series in those
   cases where it does occur; and
 * adds 1 host context save for each preempt-or-enter-userspace ...
   preempt-or-enter-userspace interval of a vcpu thread during which
   the guest does not use FPSIMD.
  
The last bullet is the only one that can add cost.  I can imagine
hitting this during an I/O emulation storm.  I feel that most of the
rest of the time the change would be a net win, but it's hard to gauge
the overall impact.


Migrating to using the host context switch machinery as-is for
managing the guest FPSIMD context would allow all the redundant
saves/restores would be eliminated.

It would be a more invasive change though, and I don't think this
series should attempt it.

> > > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > > restored at that point: there is no laziness there -- it _could_
> > > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > > fact that the compiler can and does generate FPSIMD code quite
> > > > liberally by default).
> > > > 
> > > > For the case of being preempted within the kernel with no ret_to_user,
> > > > you are correct.
> > > > 
> > > 
> > > ok, that would indeed also be useful for things like switching to a
> > > vhost thread and returning to the vcpu thread.
> > 
> > What's a vhost thread?
> > 
> 
> vhost is the offload mechanism for the data-path of virtio devices, and
> for example if you have an application in your VM which is sending data,
> the VM will typically fill some buffer and then cause a trap to the host
> kernel by writing to an emulated PCI MMIO register, and that trap is
> handled by KVM's IO bus infrastructure, which schedules a vhost kernel
> thread which actually sends the data from the buffer shared between the
> VM and the host kernel onto the physical network.

Ah, right.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-13 14:08                 ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-13 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:

[...]

> > Simply entering the kernel and returning to userspace doesn't have
> > this effect by itself.
> > 
> > 
> > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > context switch machinery by ensuring that whatever the host had
> > in the FPSIMD regs at guest entry is restored before returning to
> > the host. (IIUC)  
> 
> Only if the guest actually touches FPSIMD state.  If the guest doesn't
> touch FPSIMD (no trap to EL2), then we never touch the state at all.

I should have been clearer: KVM ensures that the state is _unchanged_
before returning to the host, but can elide the save/restore when the
guest doesn't touch the state...

> 
> > This means that redundant save/restore work is
> > done by KVM, but does have the advantage of simplicity.
> 
> I don't understand what the redundant part here is?  Isn't it only
> redundant in the case where the host (for some reason) has already saved
> its FPSIMD state?  I assume that won't be the common case, since
> "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> explained above.

...however, when this elision does not occur, it may duplicate
save/restore done by the kernel, or it may save/restore worthless data
if the host's FPSIMD state is non-live at the time.

It's hard to gauge the impact of this: it seems unlikely to make a
massive difference, but will be highly workload-dependent.


The redundancy occurs because of the deferred restore of the FPSIMD
registers for host userspace: as a result, the host FPSIMD regs are
either discardable (i.e., already saved) or not live at all between
and context switch and the next ret_to_user.

This means that if the vcpu run loop is preempted, then when the host
switches back to the run loop it is pointless to save or restore the
host FPSIMD state.

A typical sequence of events exposing this redundancy would be as
follows.  I assume here that there are two cpu-bound tasks A and B
competing for a host CPU, where A is a vcpu thread:

 - vcpu A is in the guest running a compute-heavy task
 - FPSIMD typically traps to the host before context switch
 X kvm saves the host FPSIMD state
 - kvm loads the guest FPSIMD state
 - vcpu A reenters the guest
 - host context switch IRQ preempts A back to the run loop
 Y kvm loads the host FPSIMD state via vcpu_put

 - host context switch:
 - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
 - switch to B
 - B reaches ret_to_user
 Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
 - B enters userspace

 - host context switch:
 - B enters kernel
 X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
 - switch to A -> set TIF_FOREIGN_FPSTATE for A
 - back to the KVM run loop

 - vcpu A enters guest
 - redo from start

Here, the two saves marked X are redundant with respect to each other,
and the two restores marked Y are redundant with respect to each other.

> > This breaks for SVE though: the high bits of the Z-registers will be
> > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > This means that if the host has state in those bits then it must
> > be saved before entring the guest: that's what the new
> > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> 
> Again, I'm confused, because to me it looks like
> kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> which just sets a pointer to NULL, but doesn't actually save the state.
> 
> So, when is the state in the hardware registers saved to memory?

This _is_ quite confusing: in writing this answer I identified a bug
and then realised why there is no bug...

kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
actually saved today because we explicitly don't care about preserving
the SVE state, because the syscall ABI throws the SVE regs away as
a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
ensures that the non-SVE FPSIMD bits _are_ restored by itself.

I think my proposal is that this hook might take on the role of
actually saving the state too, if we move that out of the KVM host
context save/restore code.

Perhaps we could even replace

	preempt_disable();
	kvm_fpsimd_flush_cpu_state();
	/* ... */
	preempt_enable();

with

	kernel_neon_begin();
	/* ... */
	kernel_neon_end();

which does have the host user context saving built in -- though this
may have unwanted side effects, such as masking softirqs.  Possibly not
a big deal though if the region is kept small (?)


<aside>

Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
not trivial...  the explanation goes something like this:

(*takes deep breath*)

A link is maintained between tasks and CPUs to indicate whether a
given CPU has the task's FPSIMD state in its regs.

For brevity, I'll describe this link as a relation loaded(task, cpu).

	loaded(current, smp_processor_id()) <->
		!test_thread_flag(TIF_FOREIGN_FPSTATE).

(In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)

For non-current tasks, the relation is something like

	loaded(task, cpu) <->
		&task->thread.fpsimd_state ==
			per_cpu(fpsimd_last_state.st, cpu) &&
		task->thread.fpsimd_state.cpu == cpu.

There are subtleties about when these equivalences are meaningful
and how they can be checked safely that I'll gloss over here --
to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
task_struct in KVM guest entry").

 * loaded(task, cpu) is made false for all cpus and a given task
   by fpsimd_flush_task_state(task).

   This is how we invalidate a stale copy of some task's state when
   the kernel deliberately changes the state (e.g., exec, sigreturn,
   PTRACE_SETREGSET).

 * loaded(task, smp_processor_id()) is made false for all tasks
   by fpsimd_flush_cpu_state().

   This is how we avoid using the FPSIMD regs of some CPU that
   the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
   of any task's FPSIMD state.

 * loaded(current, smp_processor_id()) is made true by
   fpsimd_bind_to_cpu().

   fpsimd_bind_to_cpu() also implies the effects of
   fpsimd_flush_task_state(current) and
   fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
   established.  This is not explicit in the code, but falls out from
   the way the relation is represented.


( There is a wrinkle here: fpsimd_flush_task_state(task) should always
be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
garbage left in the SVE bits by KVM's save/restore may spuriously
appear in the vcpu thread's user regs.  But since that data will be (a)
zeros or (b) the task's own data; and because TIF_SVE is cleared in
entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
think this matters in practice.

If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
case too then this becomes significant and we _would_ need to clear
TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the
vcpu user thread. )

</aside>

> > The alternative would have been for KVM to save/restore the host SVE
> > state directly, but this seemed premature and invasive in the absence
> > of full KVM SVE support.
> > 
> > This means that KVM's own save/restore of the host's FPSIMD state
> > becomes redundant in this case, but since there is no SVE hardware
> > yet, I favoured correctness over optimal performance here.
> > 
> 
> I agree with the approach, I guess I just can't seem to follow the code
> correctly...

Understandable... even trying to remember how it works is giving me a
headache #P

> 
> > 
> > My point here was that we could modify this hook to always save off the
> > host FPSIMD state unconditionally before entering the guts of KVM,
> > instead of only doing it when there is live SVE state.  The benefit of
> > this is that the host context switch machinery knows if the state has
> > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > will only save the host FPSIMD (or SVE) state once, and won't restore
> > it at all (assuming no context switches).
> > 
> > Instead, the user thread's FPSIMD state is only reloaded on the final
> > return to userspace.
> > 
> 
> I think that would invert the logic we have now, so instead of only
> saving/restoring the FPSIMD state when the guest uses it (as we do now),
> we would only save/restore the FPSIMD state when the host uses it,
> regardless of what the guest does.

I'm not sure that's a complete characterisation of what's going on,
but I'm struggling to describe my view in simple terms.

> Ideally, we could have a combination of both, but it's unclear to me if
> we have good indications that one case is more likely than the other.
> 
> My gut feeling though, is that the guest will be likely to often access
> FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> primarily running VMs.

I think my suggestion:

 * neither adds nor elides any saves or restores of the guest context;
 * elides some saves and restores of the host context;
 * moves the host context save with respect to your series in those
   cases where it does occur; and
 * adds 1 host context save for each preempt-or-enter-userspace ...
   preempt-or-enter-userspace interval of a vcpu thread during which
   the guest does not use FPSIMD.
  
The last bullet is the only one that can add cost.  I can imagine
hitting this during an I/O emulation storm.  I feel that most of the
rest of the time the change would be a net win, but it's hard to gauge
the overall impact.


Migrating to using the host context switch machinery as-is for
managing the guest FPSIMD context would allow all the redundant
saves/restores would be eliminated.

It would be a more invasive change though, and I don't think this
series should attempt it.

> > > > Yes (except that if a return to userspace happens then FPSIMD will be
> > > > restored at that point: there is no laziness there -- it _could_
> > > > be lazy, but it's deemed unlikely to be a performance win due to the
> > > > fact that the compiler can and does generate FPSIMD code quite
> > > > liberally by default).
> > > > 
> > > > For the case of being preempted within the kernel with no ret_to_user,
> > > > you are correct.
> > > > 
> > > 
> > > ok, that would indeed also be useful for things like switching to a
> > > vhost thread and returning to the vcpu thread.
> > 
> > What's a vhost thread?
> > 
> 
> vhost is the offload mechanism for the data-path of virtio devices, and
> for example if you have an application in your VM which is sending data,
> the VM will typically fill some buffer and then cause a trap to the host
> kernel by writing to an emulated PCI MMIO register, and that trap is
> handled by KVM's IO bus infrastructure, which schedules a vhost kernel
> thread which actually sends the data from the buffer shared between the
> VM and the host kernel onto the physical network.

Ah, right.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
  2018-02-13  8:55           ` Christoffer Dall
@ 2018-02-13 14:27             ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-13 14:27 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, kvm, linux-arm-kernel

On Tue, Feb 13, 2018 at 09:55:02AM +0100, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 04:17:39PM +0000, Dave Martin wrote:
> > On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> > > On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:

[...]

> > > > The individual accessor functions also become unnecessary in this case,
> > > > because we wouldn't need to derive function pointers from them any
> > > > more.
> > > > 
> > > > I don't know how performance would compare in practice though.
> > > 
> > > I don't know either.  But I will say that the whole idea behind put/load
> > > is that you do this rarely, and going to userspace from KVM is
> > > notriously expensive, also on x86.
> > 
> > I guess that makes sense.  I'm still a bit hazy on the big picture
> > for KVM.
> > 
> > > > I'm also assuming that all calls to these accessors are const-foldable.
> > > > If not, relying on inlining would bloat the generated code a lot.
> > > 
> > > We have places where this is not the cae, access_vm_reg() for example.
> > > But if we really, really, wanted to, we could rewrite that to have a
> > > function for each register, but that's pretty horrid on its own.
> > 
> > That might not be too bad if there is only one giant inline expansion
> > and the rest are folded down.
> > 
> > 
> > I guess this is something to revisit _if_ we suspect a performance
> > bottleneck later on.
> > 
> > For now, I was lacking some understanding regarding how this code gets
> > run, so I was guessing about potential issues rather then proven
> > issues.
> > 
> 
> This was a very useful discussion.  I think I'll change this to a big
> switch statement in the header file using a static inline, because it
> makes the code more readable, and if we notice a huge code size
> explosion, we can take measures to make sure things are const-foldable.

Sure, that sounds reasonable.

C99 inline semantics allow a single out-of-line body to be linked in
somewhere for when the function isn't inlined, so we might be able to
mitigate the bloat that way if it's a problem... unless the compiler
flags sabotage it (I remember GCC traditionally does something a bit
different where there's a particular difference between "inline" and
"extern inline".)

> > As you might guess, I'm still at the "stupid questions" stage for
> > this series :)
>
> Not at all.

Hmmm, I must try to be more stupid when I look at the other
patches...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs
@ 2018-02-13 14:27             ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-13 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 13, 2018 at 09:55:02AM +0100, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 04:17:39PM +0000, Dave Martin wrote:
> > On Thu, Jan 25, 2018 at 08:54:13PM +0100, Christoffer Dall wrote:
> > > On Tue, Jan 23, 2018 at 04:04:40PM +0000, Dave Martin wrote:

[...]

> > > > The individual accessor functions also become unnecessary in this case,
> > > > because we wouldn't need to derive function pointers from them any
> > > > more.
> > > > 
> > > > I don't know how performance would compare in practice though.
> > > 
> > > I don't know either.  But I will say that the whole idea behind put/load
> > > is that you do this rarely, and going to userspace from KVM is
> > > notriously expensive, also on x86.
> > 
> > I guess that makes sense.  I'm still a bit hazy on the big picture
> > for KVM.
> > 
> > > > I'm also assuming that all calls to these accessors are const-foldable.
> > > > If not, relying on inlining would bloat the generated code a lot.
> > > 
> > > We have places where this is not the cae, access_vm_reg() for example.
> > > But if we really, really, wanted to, we could rewrite that to have a
> > > function for each register, but that's pretty horrid on its own.
> > 
> > That might not be too bad if there is only one giant inline expansion
> > and the rest are folded down.
> > 
> > 
> > I guess this is something to revisit _if_ we suspect a performance
> > bottleneck later on.
> > 
> > For now, I was lacking some understanding regarding how this code gets
> > run, so I was guessing about potential issues rather then proven
> > issues.
> > 
> 
> This was a very useful discussion.  I think I'll change this to a big
> switch statement in the header file using a static inline, because it
> makes the code more readable, and if we notice a huge code size
> explosion, we can take measures to make sure things are const-foldable.

Sure, that sounds reasonable.

C99 inline semantics allow a single out-of-line body to be linked in
somewhere for when the function isn't inlined, so we might be able to
mitigate the bloat that way if it's a problem... unless the compiler
flags sabotage it (I remember GCC traditionally does something a bit
different where there's a particular difference between "inline" and
"extern inline".)

> > As you might guess, I'm still at the "stupid questions" stage for
> > this series :)
>
> Not at all.

Hmmm, I must try to be more stupid when I look at the other
patches...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
  2018-02-09 11:38     ` Julien Grall
@ 2018-02-13 21:47       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13 21:47 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 11:38:50AM +0000, Julien Grall wrote:
> Hi,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >From: Shih-Wei Li <shihwei@cs.columbia.edu>
> >
> >We always set the IMO and FMO bits in the HCR_EL2 when running the
> >guest, regardless if we use the vgic or not.  By moving these flags to
> >HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
> >HCR_EL2 in the world switch code, and we can also soon get rid of the
> >other one.
> >
> >This is safe, because even though the IMO and FMO bits control both
> >taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
> >executed at EL1, as long as we ensure that these bits are clear when
> >running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.
> 
> NIT: I was a bit confused by the end of the sentence because HCR_HOST_FLAGS
> define does not seem to exist.
> 

True, that was nonsense.

I have reworded it slightly.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag
@ 2018-02-13 21:47       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13 21:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 11:38:50AM +0000, Julien Grall wrote:
> Hi,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >From: Shih-Wei Li <shihwei@cs.columbia.edu>
> >
> >We always set the IMO and FMO bits in the HCR_EL2 when running the
> >guest, regardless if we use the vgic or not.  By moving these flags to
> >HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
> >HCR_EL2 in the world switch code, and we can also soon get rid of the
> >other one.
> >
> >This is safe, because even though the IMO and FMO bits control both
> >taking the interrupts to EL2 and remapping ICC_*_EL1 to ICV_*_EL1
> >executed at EL1, as long as we ensure that these bits are clear when
> >running the EL1 host, as defined in the HCR_HOST_[VHE_]FLAGS, we're OK.
> 
> NIT: I was a bit confused by the end of the sentence because HCR_HOST_FLAGS
> define does not seem to exist.
> 

True, that was nonsense.

I have reworded it slightly.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  2018-02-09 17:53     ` Julien Grall
@ 2018-02-13 22:31       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13 22:31 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >The VHE switch function calls __timer_enable_traps and
> >__timer_disable_traps which don't do anything on VHE systems.
> >Therefore, simply remove these calls from the VHE switch function and
> >make the functions non-conditional as they are now only called from the
> >non-VHE switch path.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c |  2 --
> >  virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
> >  2 files changed, 22 insertions(+), 24 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 9aadef6966bf..6175fcb33ed2 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu->kvm);
> >  	__vgic_restore_state(vcpu);
> >-	__timer_enable_traps(vcpu);
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> >@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> >-	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  	__deactivate_traps(vcpu);
> >diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> >index f24404b3c8df..77754a62eb0c 100644
> >--- a/virt/kvm/arm/hyp/timer-sr.c
> >+++ b/virt/kvm/arm/hyp/timer-sr.c
> >@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> 
> Would it be worth to suffix the function with nvhe? So it would be clear
> that it should not be called for VHE system?
> 
Actually, I decided against this, because it's also called from the
32-bit code and it looks a little strange there, and it's not like we
have an equivalent _vhe version.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
@ 2018-02-13 22:31       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-13 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >The VHE switch function calls __timer_enable_traps and
> >__timer_disable_traps which don't do anything on VHE systems.
> >Therefore, simply remove these calls from the VHE switch function and
> >make the functions non-conditional as they are now only called from the
> >non-VHE switch path.
> >
> >Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/switch.c |  2 --
> >  virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
> >  2 files changed, 22 insertions(+), 24 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >index 9aadef6966bf..6175fcb33ed2 100644
> >--- a/arch/arm64/kvm/hyp/switch.c
> >+++ b/arch/arm64/kvm/hyp/switch.c
> >@@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu->kvm);
> >  	__vgic_restore_state(vcpu);
> >-	__timer_enable_traps(vcpu);
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> >@@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> >-	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  	__deactivate_traps(vcpu);
> >diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> >index f24404b3c8df..77754a62eb0c 100644
> >--- a/virt/kvm/arm/hyp/timer-sr.c
> >+++ b/virt/kvm/arm/hyp/timer-sr.c
> >@@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >+/*
> >+ * Should only be called on non-VHE systems.
> >+ * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
> >+ */
> >  void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> 
> Would it be worth to suffix the function with nvhe? So it would be clear
> that it should not be called for VHE system?
> 
Actually, I decided against this, because it's also called from the
32-bit code and it looks a little strange there, and it's not like we
have an equivalent _vhe version.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-13 14:08                 ` Dave Martin
@ 2018-02-14 10:15                   ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 10:15 UTC (permalink / raw)
  To: Dave Martin; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, kvm, linux-arm-kernel

On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > Simply entering the kernel and returning to userspace doesn't have
> > > this effect by itself.
> > > 
> > > 
> > > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > > context switch machinery by ensuring that whatever the host had
> > > in the FPSIMD regs at guest entry is restored before returning to
> > > the host. (IIUC)  
> > 
> > Only if the guest actually touches FPSIMD state.  If the guest doesn't
> > touch FPSIMD (no trap to EL2), then we never touch the state at all.
> 
> I should have been clearer: KVM ensures that the state is _unchanged_
> before returning to the host, but can elide the save/restore when the
> guest doesn't touch the state...
> 
> > 
> > > This means that redundant save/restore work is
> > > done by KVM, but does have the advantage of simplicity.
> > 
> > I don't understand what the redundant part here is?  Isn't it only
> > redundant in the case where the host (for some reason) has already saved
> > its FPSIMD state?  I assume that won't be the common case, since
> > "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> > explained above.
> 
> ...however, when this elision does not occur, it may duplicate
> save/restore done by the kernel, or it may save/restore worthless data
> if the host's FPSIMD state is non-live at the time.
> 
> It's hard to gauge the impact of this: it seems unlikely to make a
> massive difference, but will be highly workload-dependent.

So I thought it might be useful to have some idea of the frequency of
events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
some networking benchmarks, and I counted a few events:

 - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
   system, fewer than 1% of them result in an exit to userspace (0.57%).

 - The VCPU thread was preempted (voluntarily or forced) in the kernel
   less than 3% of the exits (2.72%).  That's just below 5 preemptions
   per ioctl(KVM_RUN).

 - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
   registers and the host context was restored.

 - We store the host context about 1.38 times per ioctl(KVM_RUN).

So that tells me that (1) it's worth restoring the guest FPSIMD state
lazily as opposed to proactively on vcpu_load, and (2) that there's a
small opportunity for improvement by reducing redundant host vfp state
saves.

> 
> 
> The redundancy occurs because of the deferred restore of the FPSIMD
> registers for host userspace: as a result, the host FPSIMD regs are
> either discardable (i.e., already saved) or not live at all between
> and context switch and the next ret_to_user.
> 
> This means that if the vcpu run loop is preempted, then when the host
> switches back to the run loop it is pointless to save or restore the
> host FPSIMD state.
> 
> A typical sequence of events exposing this redundancy would be as
> follows.  I assume here that there are two cpu-bound tasks A and B
> competing for a host CPU, where A is a vcpu thread:
> 
>  - vcpu A is in the guest running a compute-heavy task
>  - FPSIMD typically traps to the host before context switch
>  X kvm saves the host FPSIMD state
>  - kvm loads the guest FPSIMD state
>  - vcpu A reenters the guest
>  - host context switch IRQ preempts A back to the run loop
>  Y kvm loads the host FPSIMD state via vcpu_put
> 
>  - host context switch:
>  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
>  - switch to B
>  - B reaches ret_to_user
>  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
>  - B enters userspace
> 
>  - host context switch:
>  - B enters kernel
>  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
>  - switch to A -> set TIF_FOREIGN_FPSTATE for A
>  - back to the KVM run loop
> 
>  - vcpu A enters guest
>  - redo from start
> 
> Here, the two saves marked X are redundant with respect to each other,
> and the two restores marked Y are redundant with respect to each other.
> 

Right, ok, but if we have

 - ioctl(KVM_RUN)
 - mark hardware FPSIMD register state as invalid
 - load guest FPSIMD state
 - enter guest
 - exit guest
 - save guest FPSIMD state
 - return to user space

(I.e. we don't do any preemption in the guest)

Then we'll loose the host FPSIMD register state, potentially, right?

Your original comment on this patch was that we didn't need to restore
the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
scenario above.  The only way I can see this working is by making sure
that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
register state if the state is live.

Am I still missing something?

> > > This breaks for SVE though: the high bits of the Z-registers will be
> > > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > > This means that if the host has state in those bits then it must
> > > be saved before entring the guest: that's what the new
> > > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> > 
> > Again, I'm confused, because to me it looks like
> > kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> > which just sets a pointer to NULL, but doesn't actually save the state.
> > 
> > So, when is the state in the hardware registers saved to memory?
> 
> This _is_ quite confusing: in writing this answer I identified a bug
> and then realised why there is no bug...
> 
> kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> actually saved today because we explicitly don't care about preserving
> the SVE state, because the syscall ABI throws the SVE regs away as
> a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> 
> I think my proposal is that this hook might take on the role of
> actually saving the state too, if we move that out of the KVM host
> context save/restore code.
> 
> Perhaps we could even replace
> 
> 	preempt_disable();
> 	kvm_fpsimd_flush_cpu_state();
> 	/* ... */
> 	preempt_enable();
> 
> with
> 
> 	kernel_neon_begin();
> 	/* ... */
> 	kernel_neon_end();

I'm not entirely sure where the begin and end points would be in the
context of KVM?

> 
> which does have the host user context saving built in -- though this
> may have unwanted side effects, such as masking softirqs.  Possibly not
> a big deal though if the region is kept small (?)
> 

I'm not sure I fully understand how this would work, so it's hard for me
to comment on.

> 
> <aside>
> 
> Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
> not trivial...  the explanation goes something like this:
> 
> (*takes deep breath*)
> 
> A link is maintained between tasks and CPUs to indicate whether a
> given CPU has the task's FPSIMD state in its regs.
> 
> For brevity, I'll describe this link as a relation loaded(task, cpu).
> 
> 	loaded(current, smp_processor_id()) <->
> 		!test_thread_flag(TIF_FOREIGN_FPSTATE).
> 
> (In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)
> 
> For non-current tasks, the relation is something like
> 
> 	loaded(task, cpu) <->
> 		&task->thread.fpsimd_state ==
> 			per_cpu(fpsimd_last_state.st, cpu) &&
> 		task->thread.fpsimd_state.cpu == cpu.
> 
> There are subtleties about when these equivalences are meaningful
> and how they can be checked safely that I'll gloss over here --
> to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
> task_struct in KVM guest entry").
> 
>  * loaded(task, cpu) is made false for all cpus and a given task
>    by fpsimd_flush_task_state(task).
> 
>    This is how we invalidate a stale copy of some task's state when
>    the kernel deliberately changes the state (e.g., exec, sigreturn,
>    PTRACE_SETREGSET).
> 
>  * loaded(task, smp_processor_id()) is made false for all tasks
>    by fpsimd_flush_cpu_state().
> 
>    This is how we avoid using the FPSIMD regs of some CPU that
>    the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
>    of any task's FPSIMD state.
> 
>  * loaded(current, smp_processor_id()) is made true by
>    fpsimd_bind_to_cpu().
> 
>    fpsimd_bind_to_cpu() also implies the effects of
>    fpsimd_flush_task_state(current) and
>    fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
>    established.  This is not explicit in the code, but falls out from
>    the way the relation is represented.
> 
> 
> ( There is a wrinkle here: fpsimd_flush_task_state(task) should always
> be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
> fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
> garbage left in the SVE bits by KVM's save/restore may spuriously
> appear in the vcpu thread's user regs.  But since that data will be (a)
> zeros or (b) the task's own data; and because TIF_SVE is cleared in
> entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
> think this matters in practice.
> 
> If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
> case too then this becomes significant and we _would_ need to clear
> TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the

clear?  Wouldn't we need to set it?

> vcpu user thread. )
> 
> </aside>
> 

Thanks for this, it's helpful.

What's missing for my understanding is when fpsimd_save_state() gets
called, which must be required in some cases of invalidating the
relation, since otherwise there must be a risk of losing state?

> > > The alternative would have been for KVM to save/restore the host SVE
> > > state directly, but this seemed premature and invasive in the absence
> > > of full KVM SVE support.
> > > 
> > > This means that KVM's own save/restore of the host's FPSIMD state
> > > becomes redundant in this case, but since there is no SVE hardware
> > > yet, I favoured correctness over optimal performance here.
> > > 
> > 
> > I agree with the approach, I guess I just can't seem to follow the code
> > correctly...
> 
> Understandable... even trying to remember how it works is giving me a
> headache #P
> 
> > 
> > > 
> > > My point here was that we could modify this hook to always save off the
> > > host FPSIMD state unconditionally before entering the guts of KVM,
> > > instead of only doing it when there is live SVE state.  The benefit of
> > > this is that the host context switch machinery knows if the state has
> > > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > > will only save the host FPSIMD (or SVE) state once, and won't restore
> > > it at all (assuming no context switches).
> > > 
> > > Instead, the user thread's FPSIMD state is only reloaded on the final
> > > return to userspace.
> > > 
> > 
> > I think that would invert the logic we have now, so instead of only
> > saving/restoring the FPSIMD state when the guest uses it (as we do now),
> > we would only save/restore the FPSIMD state when the host uses it,
> > regardless of what the guest does.
> 
> I'm not sure that's a complete characterisation of what's going on,
> but I'm struggling to describe my view in simple terms.
> 
> > Ideally, we could have a combination of both, but it's unclear to me if
> > we have good indications that one case is more likely than the other.
> > 
> > My gut feeling though, is that the guest will be likely to often access
> > FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> > also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> > (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> > primarily running VMs.
> 
> I think my suggestion:
> 
>  * neither adds nor elides any saves or restores of the guest context;
>  * elides some saves and restores of the host context;
>  * moves the host context save with respect to your series in those
>    cases where it does occur; and
>  * adds 1 host context save for each preempt-or-enter-userspace ...
>    preempt-or-enter-userspace interval of a vcpu thread during which
>    the guest does not use FPSIMD.
>   
> The last bullet is the only one that can add cost.  I can imagine
> hitting this during an I/O emulation storm.  I feel that most of the
> rest of the time the change would be a net win, but it's hard to gauge
> the overall impact.
> 

It's certainly possible to have a flow where the guest kernel is not
using FPSIMD and keeps bouncing back to host userspace which does FPSIMD
in memcpy().  This is a pretty likely case for small disk I/O, so I'm
not crazy about this.

> 
> Migrating to using the host context switch machinery as-is for
> managing the guest FPSIMD context would allow all the redundant
> saves/restores would be eliminated.
> 
> It would be a more invasive change though, and I don't think this
> series should attempt it.
> 

I agree that we should attempt to use the host machinery to switch
FPSIMD state for the guest state, as long as we can keep doing that
lazily for the guest state.  Not sure if it belongs in these patches or
not (probably not), but I think it would be helpful if we could write up
a patch to see how that would look.  I don't think any intermediate
optimizations are worth it at this point.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-14 10:15                   ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 10:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> > > On Wed, Feb 07, 2018 at 06:56:44PM +0100, Christoffer Dall wrote:
> > > > On Wed, Feb 07, 2018 at 04:49:55PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > Simply entering the kernel and returning to userspace doesn't have
> > > this effect by itself.
> > > 
> > > 
> > > Prior to the SVE patches, KVM makes itself orthogonal to the host
> > > context switch machinery by ensuring that whatever the host had
> > > in the FPSIMD regs at guest entry is restored before returning to
> > > the host. (IIUC)  
> > 
> > Only if the guest actually touches FPSIMD state.  If the guest doesn't
> > touch FPSIMD (no trap to EL2), then we never touch the state at all.
> 
> I should have been clearer: KVM ensures that the state is _unchanged_
> before returning to the host, but can elide the save/restore when the
> guest doesn't touch the state...
> 
> > 
> > > This means that redundant save/restore work is
> > > done by KVM, but does have the advantage of simplicity.
> > 
> > I don't understand what the redundant part here is?  Isn't it only
> > redundant in the case where the host (for some reason) has already saved
> > its FPSIMD state?  I assume that won't be the common case, since
> > "userspace->kernel->kvm_run" won't save the FPSIMD state, as you just
> > explained above.
> 
> ...however, when this elision does not occur, it may duplicate
> save/restore done by the kernel, or it may save/restore worthless data
> if the host's FPSIMD state is non-live at the time.
> 
> It's hard to gauge the impact of this: it seems unlikely to make a
> massive difference, but will be highly workload-dependent.

So I thought it might be useful to have some idea of the frequency of
events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
some networking benchmarks, and I counted a few events:

 - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
   system, fewer than 1% of them result in an exit to userspace (0.57%).

 - The VCPU thread was preempted (voluntarily or forced) in the kernel
   less than 3% of the exits (2.72%).  That's just below 5 preemptions
   per ioctl(KVM_RUN).

 - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
   registers and the host context was restored.

 - We store the host context about 1.38 times per ioctl(KVM_RUN).

So that tells me that (1) it's worth restoring the guest FPSIMD state
lazily as opposed to proactively on vcpu_load, and (2) that there's a
small opportunity for improvement by reducing redundant host vfp state
saves.

> 
> 
> The redundancy occurs because of the deferred restore of the FPSIMD
> registers for host userspace: as a result, the host FPSIMD regs are
> either discardable (i.e., already saved) or not live at all between
> and context switch and the next ret_to_user.
> 
> This means that if the vcpu run loop is preempted, then when the host
> switches back to the run loop it is pointless to save or restore the
> host FPSIMD state.
> 
> A typical sequence of events exposing this redundancy would be as
> follows.  I assume here that there are two cpu-bound tasks A and B
> competing for a host CPU, where A is a vcpu thread:
> 
>  - vcpu A is in the guest running a compute-heavy task
>  - FPSIMD typically traps to the host before context switch
>  X kvm saves the host FPSIMD state
>  - kvm loads the guest FPSIMD state
>  - vcpu A reenters the guest
>  - host context switch IRQ preempts A back to the run loop
>  Y kvm loads the host FPSIMD state via vcpu_put
> 
>  - host context switch:
>  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
>  - switch to B
>  - B reaches ret_to_user
>  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
>  - B enters userspace
> 
>  - host context switch:
>  - B enters kernel
>  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
>  - switch to A -> set TIF_FOREIGN_FPSTATE for A
>  - back to the KVM run loop
> 
>  - vcpu A enters guest
>  - redo from start
> 
> Here, the two saves marked X are redundant with respect to each other,
> and the two restores marked Y are redundant with respect to each other.
> 

Right, ok, but if we have

 - ioctl(KVM_RUN)
 - mark hardware FPSIMD register state as invalid
 - load guest FPSIMD state
 - enter guest
 - exit guest
 - save guest FPSIMD state
 - return to user space

(I.e. we don't do any preemption in the guest)

Then we'll loose the host FPSIMD register state, potentially, right?

Your original comment on this patch was that we didn't need to restore
the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
scenario above.  The only way I can see this working is by making sure
that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
register state if the state is live.

Am I still missing something?

> > > This breaks for SVE though: the high bits of the Z-registers will be
> > > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > > This means that if the host has state in those bits then it must
> > > be saved before entring the guest: that's what the new
> > > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> > 
> > Again, I'm confused, because to me it looks like
> > kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> > which just sets a pointer to NULL, but doesn't actually save the state.
> > 
> > So, when is the state in the hardware registers saved to memory?
> 
> This _is_ quite confusing: in writing this answer I identified a bug
> and then realised why there is no bug...
> 
> kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> actually saved today because we explicitly don't care about preserving
> the SVE state, because the syscall ABI throws the SVE regs away as
> a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> 
> I think my proposal is that this hook might take on the role of
> actually saving the state too, if we move that out of the KVM host
> context save/restore code.
> 
> Perhaps we could even replace
> 
> 	preempt_disable();
> 	kvm_fpsimd_flush_cpu_state();
> 	/* ... */
> 	preempt_enable();
> 
> with
> 
> 	kernel_neon_begin();
> 	/* ... */
> 	kernel_neon_end();

I'm not entirely sure where the begin and end points would be in the
context of KVM?

> 
> which does have the host user context saving built in -- though this
> may have unwanted side effects, such as masking softirqs.  Possibly not
> a big deal though if the region is kept small (?)
> 

I'm not sure I fully understand how this would work, so it's hard for me
to comment on.

> 
> <aside>
> 
> Understanding precisely what kvm_fpsimd_flush_cpu_state() does is
> not trivial...  the explanation goes something like this:
> 
> (*takes deep breath*)
> 
> A link is maintained between tasks and CPUs to indicate whether a
> given CPU has the task's FPSIMD state in its regs.
> 
> For brevity, I'll describe this link as a relation loaded(task, cpu).
> 
> 	loaded(current, smp_processor_id()) <->
> 		!test_thread_flag(TIF_FOREIGN_FPSTATE).
> 
> (In effect, TIF_FOREIGN_FPSTATE caches this relation for current.)
> 
> For non-current tasks, the relation is something like
> 
> 	loaded(task, cpu) <->
> 		&task->thread.fpsimd_state ==
> 			per_cpu(fpsimd_last_state.st, cpu) &&
> 		task->thread.fpsimd_state.cpu == cpu.
> 
> There are subtleties about when these equivalences are meaningful
> and how they can be checked safely that I'll gloss over here --
> to get an idea, see cb968afc7898 ("arm64/sve: Avoid dereference of dead
> task_struct in KVM guest entry").
> 
>  * loaded(task, cpu) is made false for all cpus and a given task
>    by fpsimd_flush_task_state(task).
> 
>    This is how we invalidate a stale copy of some task's state when
>    the kernel deliberately changes the state (e.g., exec, sigreturn,
>    PTRACE_SETREGSET).
> 
>  * loaded(task, smp_processor_id()) is made false for all tasks
>    by fpsimd_flush_cpu_state().
> 
>    This is how we avoid using the FPSIMD regs of some CPU that
>    the kernel trashed (e.g., kernel_mode_neon, KVM) as a source
>    of any task's FPSIMD state.
> 
>  * loaded(current, smp_processor_id()) is made true by
>    fpsimd_bind_to_cpu().
> 
>    fpsimd_bind_to_cpu() also implies the effects of
>    fpsimd_flush_task_state(current) and
>    fpsimd_flush_cpu_state(smp_processor_id()) before the new relation is
>    established.  This is not explicit in the code, but falls out from
>    the way the relation is represented.
> 
> 
> ( There is a wrinkle here: fpsimd_flush_task_state(task) should always
> be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
> fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
> garbage left in the SVE bits by KVM's save/restore may spuriously
> appear in the vcpu thread's user regs.  But since that data will be (a)
> zeros or (b) the task's own data; and because TIF_SVE is cleared in
> entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
> think this matters in practice.
> 
> If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
> case too then this becomes significant and we _would_ need to clear
> TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the

clear?  Wouldn't we need to set it?

> vcpu user thread. )
> 
> </aside>
> 

Thanks for this, it's helpful.

What's missing for my understanding is when fpsimd_save_state() gets
called, which must be required in some cases of invalidating the
relation, since otherwise there must be a risk of losing state?

> > > The alternative would have been for KVM to save/restore the host SVE
> > > state directly, but this seemed premature and invasive in the absence
> > > of full KVM SVE support.
> > > 
> > > This means that KVM's own save/restore of the host's FPSIMD state
> > > becomes redundant in this case, but since there is no SVE hardware
> > > yet, I favoured correctness over optimal performance here.
> > > 
> > 
> > I agree with the approach, I guess I just can't seem to follow the code
> > correctly...
> 
> Understandable... even trying to remember how it works is giving me a
> headache #P
> 
> > 
> > > 
> > > My point here was that we could modify this hook to always save off the
> > > host FPSIMD state unconditionally before entering the guts of KVM,
> > > instead of only doing it when there is live SVE state.  The benefit of
> > > this is that the host context switch machinery knows if the state has
> > > already been saved and won't do it again.  Thus a kvm userspace -> vcpu
> > > (-> guest exit -> vcpu)* -> guest_exit sequence of arbitrary length
> > > will only save the host FPSIMD (or SVE) state once, and won't restore
> > > it at all (assuming no context switches).
> > > 
> > > Instead, the user thread's FPSIMD state is only reloaded on the final
> > > return to userspace.
> > > 
> > 
> > I think that would invert the logic we have now, so instead of only
> > saving/restoring the FPSIMD state when the guest uses it (as we do now),
> > we would only save/restore the FPSIMD state when the host uses it,
> > regardless of what the guest does.
> 
> I'm not sure that's a complete characterisation of what's going on,
> but I'm struggling to describe my view in simple terms.
> 
> > Ideally, we could have a combination of both, but it's unclear to me if
> > we have good indications that one case is more likely than the other.
> > 
> > My gut feeling though, is that the guest will be likely to often access
> > FPSIMD state for as long as we're in KVM_RUN, and that host userspace
> > also often uses FPSIMD (memcopy, etc.), but the rest of the host kernel
> > (kernel threads etc.) is unlikely to use FPSIMD for a system that is
> > primarily running VMs.
> 
> I think my suggestion:
> 
>  * neither adds nor elides any saves or restores of the guest context;
>  * elides some saves and restores of the host context;
>  * moves the host context save with respect to your series in those
>    cases where it does occur; and
>  * adds 1 host context save for each preempt-or-enter-userspace ...
>    preempt-or-enter-userspace interval of a vcpu thread during which
>    the guest does not use FPSIMD.
>   
> The last bullet is the only one that can add cost.  I can imagine
> hitting this during an I/O emulation storm.  I feel that most of the
> rest of the time the change would be a net win, but it's hard to gauge
> the overall impact.
> 

It's certainly possible to have a flow where the guest kernel is not
using FPSIMD and keeps bouncing back to host userspace which does FPSIMD
in memcpy().  This is a pretty likely case for small disk I/O, so I'm
not crazy about this.

> 
> Migrating to using the host context switch machinery as-is for
> managing the guest FPSIMD context would allow all the redundant
> saves/restores would be eliminated.
> 
> It would be a more invasive change though, and I don't think this
> series should attempt it.
> 

I agree that we should attempt to use the host machinery to switch
FPSIMD state for the guest state, as long as we can keep doing that
lazily for the guest state.  Not sure if it belongs in these patches or
not (probably not), but I think it would be helpful if we could write up
a patch to see how that would look.  I don't think any intermediate
optimizations are worth it at this point.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
  2018-02-09 18:50     ` Julien Grall
@ 2018-02-14 11:22       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 11:22 UTC (permalink / raw)
  To: Julien Grall; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Shih-Wei Li, kvm

On Fri, Feb 09, 2018 at 06:50:14PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >There's a semantic difference between the EL1 registers that control
> >operation of a kernel running in EL1 and EL1 registers that only control
> >userspace execution in EL0.  Since we can defer saving/restoring the
> >latter, move them into their own function.
> >
> >We also take this chance to rename the function saving/restoring the
> >remaining system register to make it clear this function deals with
> >the EL1 system registers.
> >
> >No functional change.
> >
> >Reviewed-by: Andrew Jones <drjones@redhat.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
> >  1 file changed, 34 insertions(+), 12 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> >index 848a46eb33bf..99dd50ce483b 100644
> >--- a/arch/arm64/kvm/hyp/sysreg-sr.c
> >+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> >@@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
> >  static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
> >  {
> >-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> 
> I am a bit confused, the comment on top of the function says the host must
> save ACTLR_EL1 in the VHE case. But AFAICT, after this patch the register
> will not get saved in the host context. Did I miss anything?
> 
You're right, there is indeed a functional change here, introduced in
v2.

I have adjusted the commentary and the patch description.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function
@ 2018-02-14 11:22       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Feb 09, 2018 at 06:50:14PM +0000, Julien Grall wrote:
> Hi Christoffer,
> 
> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
> >There's a semantic difference between the EL1 registers that control
> >operation of a kernel running in EL1 and EL1 registers that only control
> >userspace execution in EL0.  Since we can defer saving/restoring the
> >latter, move them into their own function.
> >
> >We also take this chance to rename the function saving/restoring the
> >remaining system register to make it clear this function deals with
> >the EL1 system registers.
> >
> >No functional change.
> >
> >Reviewed-by: Andrew Jones <drjones@redhat.com>
> >Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >---
> >  arch/arm64/kvm/hyp/sysreg-sr.c | 46 +++++++++++++++++++++++++++++++-----------
> >  1 file changed, 34 insertions(+), 12 deletions(-)
> >
> >diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> >index 848a46eb33bf..99dd50ce483b 100644
> >--- a/arch/arm64/kvm/hyp/sysreg-sr.c
> >+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> >@@ -34,18 +34,27 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
> >  static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
> >  {
> >-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> 
> I am a bit confused, the comment on top of the function says the host must
> save ACTLR_EL1 in the VHE case. But AFAICT, after this patch the register
> will not get saved in the host context. Did I miss anything?
> 
You're right, there is indeed a functional change here, introduced in
v2.

I have adjusted the commentary and the patch description.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-14 10:15                   ` Christoffer Dall
@ 2018-02-14 14:43                     ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-14 14:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li, kvm, Ard Biesheuvel

[CC Ard, in case he has a view on how much we care about softirq NEON
performance regressions ... and whether my suggestions make sense]

On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:

[...]

> > It's hard to gauge the impact of this: it seems unlikely to make a
> > massive difference, but will be highly workload-dependent.
> 
> So I thought it might be useful to have some idea of the frequency of
> events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
> 14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
> some networking benchmarks, and I counted a few events:
> 
>  - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
>    system, fewer than 1% of them result in an exit to userspace (0.57%).
> 
>  - The VCPU thread was preempted (voluntarily or forced) in the kernel
>    less than 3% of the exits (2.72%).  That's just below 5 preemptions
>    per ioctl(KVM_RUN).
> 
>  - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
>    registers and the host context was restored.
> 
>  - We store the host context about 1.38 times per ioctl(KVM_RUN).
> 
> So that tells me that (1) it's worth restoring the guest FPSIMD state
> lazily as opposed to proactively on vcpu_load, and (2) that there's a
> small opportunity for improvement by reducing redundant host vfp state
> saves.

That's really useful.  I guess it confirms that lazy guest FPSIMD
restore is desirable (though I wasn't disputing this) and that the
potential benefit from eliminating redundant host FPSIMD saves is
modest, assume that this workload is representative.

So we shouldn't over-optimise for the latter if there are side costs
from doing so.

> > The redundancy occurs because of the deferred restore of the FPSIMD
> > registers for host userspace: as a result, the host FPSIMD regs are
> > either discardable (i.e., already saved) or not live at all between
> > and context switch and the next ret_to_user.
> > 
> > This means that if the vcpu run loop is preempted, then when the host
> > switches back to the run loop it is pointless to save or restore the
> > host FPSIMD state.
> > 
> > A typical sequence of events exposing this redundancy would be as
> > follows.  I assume here that there are two cpu-bound tasks A and B
> > competing for a host CPU, where A is a vcpu thread:
> > 
> >  - vcpu A is in the guest running a compute-heavy task
> >  - FPSIMD typically traps to the host before context switch
> >  X kvm saves the host FPSIMD state
> >  - kvm loads the guest FPSIMD state
> >  - vcpu A reenters the guest
> >  - host context switch IRQ preempts A back to the run loop
> >  Y kvm loads the host FPSIMD state via vcpu_put
> > 
> >  - host context switch:
> >  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
> >  - switch to B
> >  - B reaches ret_to_user
> >  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
> >  - B enters userspace
> > 
> >  - host context switch:
> >  - B enters kernel
> >  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
> >  - switch to A -> set TIF_FOREIGN_FPSTATE for A
> >  - back to the KVM run loop
> > 
> >  - vcpu A enters guest
> >  - redo from start
> > 
> > Here, the two saves marked X are redundant with respect to each other,
> > and the two restores marked Y are redundant with respect to each other.
> > 
> 
> Right, ok, but if we have
> 
>  - ioctl(KVM_RUN)
>  - mark hardware FPSIMD register state as invalid
>  - load guest FPSIMD state
>  - enter guest
>  - exit guest
>  - save guest FPSIMD state
>  - return to user space
> 
> (I.e. we don't do any preemption in the guest)
> 
> Then we'll loose the host FPSIMD register state, potentially, right?

Yes.

However, (disregarding kernel-mode NEON) no host task's state can be
live in the FPSIMD regs other than current's.  If another context's
state is in the regs, it is either stale or a clean copy and we can
harmlessly invalidate the association with no ill effects.

The subtlety here comes from the SVE syscall ABI, which allows
current's non-FPSIMD SVE bits to be discarded across a syscall: in this
code, current _is_ in a syscall, so the fact that we can lose current's
SVE bits here is fine: TIF_SVE will have been cleared in entry.S on the
way in, and that means that SVE will trap for userspace giving a chance
to zero those regs lazily for userspace when/if they're used again.
Conversely, current's FPSIMD regs are preserved separately by KVM.

> Your original comment on this patch was that we didn't need to restore
> the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
> scenario above.  The only way I can see this working is by making sure
> that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
> register state if the state is live.
> 
> Am I still missing something?

[1] No, you're correct.  If we move the responsibility for context
handling to kvm_fpsimd_flush_cpu_state(), then we do have to put the
host context save there, which means it couldn't then be done lazily
(at least, not without more invasive changes)...

> > > > This breaks for SVE though: the high bits of the Z-registers will be
> > > > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > > > This means that if the host has state in those bits then it must
> > > > be saved before entring the guest: that's what the new
> > > > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> > > 
> > > Again, I'm confused, because to me it looks like
> > > kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> > > which just sets a pointer to NULL, but doesn't actually save the state.
> > > 
> > > So, when is the state in the hardware registers saved to memory?
> > 
> > This _is_ quite confusing: in writing this answer I identified a bug
> > and then realised why there is no bug...
> > 
> > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > actually saved today because we explicitly don't care about preserving
> > the SVE state, because the syscall ABI throws the SVE regs away as
> > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > 
> > I think my proposal is that this hook might take on the role of
> > actually saving the state too, if we move that out of the KVM host
> > context save/restore code.
> > 
> > Perhaps we could even replace
> > 
> > 	preempt_disable();
> > 	kvm_fpsimd_flush_cpu_state();
> > 	/* ... */
> > 	preempt_enable();
> > 
> > with
> > 
> > 	kernel_neon_begin();
> > 	/* ... */
> > 	kernel_neon_end();
> 
> I'm not entirely sure where the begin and end points would be in the
> context of KVM?

Hmmm, actually there's a bug in your VHE changes now I look more
closely in this area:

You assume that the only way for the FPSIMD regs to get unexpectedly
dirtied is through a context switch, but actually this is not the case:
a softirq can use kernel-mode NEON any time that softirqs are enabled.

This means that in between kvm_arch_vcpu_load() and _put() (whether via
preempt notification or not), the guest's FPSIMD state in the regs may
be trashed by a softirq.

The simplest fix is to disable softirqs and preemption for that whole
region, but since we can stay in it indefinitely that's obviously not
the right approach.  Putting kernel_neon_begin() in _load() and
kernel_neon_end() in _put() achieves the same without disabling
softirq, but preemption is still disabled throughout, which is bad.
This effectively makes the run ioctl nonpreemptible...

A better fix would be to set the cpu's kernel_neon_busy flag, which
makes softirq code use non-NEON fallback code.

We could expose an interface from fpsimd.c to support that.

It still comes at a cost though: due to the switching from NEON to
fallback code in softirq handlers, we may get a big performance
regression in setups that rely heavily on NEON in softirq for
performance.


Alternatively we could do something like the following, but it's a
rather gross abstraction violation:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 2e43f9d..6a1ff3a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * the effect of taking the interrupt again, in SVC
 		 * mode this time.
 		 */
+		local_bh_disable();
 		local_irq_enable();
 
 		/*
+		 * If we exited due to one or mode pending interrupts, they
+		 * have now been handled.  If such an interrupt pended a
+		 * softirq, we shouldn't prevent that softirq from using
+		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
+		 * the host to manage as it likes.  We'll grab it again on the
+		 * next FPSIMD trap from the guest (if any).
+		 */
+		if (local_softirq_pending() && FPSIMD untrapped for guest) {
+			/* save vcpu FPSIMD context */
+			/* enable FPSIMD trap for guest */
+		}
+		local_bh_enable();
+
+		/*
 		 * We do local_irq_enable() before calling guest_exit() so
 		 * that if a timer interrupt hits while running the guest we
 		 * account that tick as being spent in the guest.  We enable

[...]

> > ( There is a wrinkle here: fpsimd_flush_task_state(task) should always
> > be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
> > fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
> > garbage left in the SVE bits by KVM's save/restore may spuriously
> > appear in the vcpu thread's user regs.  But since that data will be (a)
> > zeros or (b) the task's own data; and because TIF_SVE is cleared in
> > entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
> > think this matters in practice.
> > 
> > If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
> > case too then this becomes significant and we _would_ need to clear
> > TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the
> 
> clear?  Wouldn't we need to set it?

Err, yes.  Just testing.

Again, kernel_mode_neon() does do that, as well as calling
fpsimd_flush_cpu_state(), showing some convergence with what kvm needs
to do here.

> > <aside>

[...]

> > </aside>
> > 
> 
> Thanks for this, it's helpful.
> 
> What's missing for my understanding is when fpsimd_save_state() gets
> called, which must be required in some cases of invalidating the
> relation, since otherwise there must be a risk of losing state?

See [1].

[...]

> > I think my suggestion:

[...]

> >  * adds 1 host context save for each preempt-or-enter-userspace ...
> >    preempt-or-enter-userspace interval of a vcpu thread during which
> >    the guest does not use FPSIMD.
> >   
> > The last bullet is the only one that can add cost.  I can imagine
> > hitting this during an I/O emulation storm.  I feel that most of the
> > rest of the time the change would be a net win, but it's hard to gauge
> > the overall impact.
> 
> It's certainly possible to have a flow where the guest kernel is not
> using FPSIMD and keeps bouncing back to host userspace which does FPSIMD
> in memcpy().  This is a pretty likely case for small disk I/O, so I'm
> not crazy about this.

Sure, understood.

> > 
> > Migrating to using the host context switch machinery as-is for
> > managing the guest FPSIMD context would allow all the redundant
> > saves/restores would be eliminated.
> > 
> > It would be a more invasive change though, and I don't think this
> > series should attempt it.
> > 
> 
> I agree that we should attempt to use the host machinery to switch
> FPSIMD state for the guest state, as long as we can keep doing that
> lazily for the guest state.  Not sure if it belongs in these patches or
> not (probably not), but I think it would be helpful if we could write up
> a patch to see how that would look.  I don't think any intermediate
> optimizations are worth it at this point.

Agreed; I think this is for the future.  If I can find a moment I may
hack on it to see how bad it looks.

But see above for my current understanding on what we need to do for
correctness today without introducing significant performance
regressions for kernel-mode NEON softirq scenarios.

Cheers
---Dave

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-14 14:43                     ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-14 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

[CC Ard, in case he has a view on how much we care about softirq NEON
performance regressions ... and whether my suggestions make sense]

On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:

[...]

> > It's hard to gauge the impact of this: it seems unlikely to make a
> > massive difference, but will be highly workload-dependent.
> 
> So I thought it might be useful to have some idea of the frequency of
> events on a balanced workload, so I ran an 8-way SMP guest on Ubuntu
> 14.04 running SPECjvm2008, a memcached benchmark, a MySQL workloads, and
> some networking benchmarks, and I counted a few events:
> 
>  - Out of all the exits, from the guest to run-loop in EL1 on a non-VHE
>    system, fewer than 1% of them result in an exit to userspace (0.57%).
> 
>  - The VCPU thread was preempted (voluntarily or forced) in the kernel
>    less than 3% of the exits (2.72%).  That's just below 5 preemptions
>    per ioctl(KVM_RUN).
> 
>  - In 29% of the preemptions (vcpu_put), the guest had touched FPSIMD
>    registers and the host context was restored.
> 
>  - We store the host context about 1.38 times per ioctl(KVM_RUN).
> 
> So that tells me that (1) it's worth restoring the guest FPSIMD state
> lazily as opposed to proactively on vcpu_load, and (2) that there's a
> small opportunity for improvement by reducing redundant host vfp state
> saves.

That's really useful.  I guess it confirms that lazy guest FPSIMD
restore is desirable (though I wasn't disputing this) and that the
potential benefit from eliminating redundant host FPSIMD saves is
modest, assume that this workload is representative.

So we shouldn't over-optimise for the latter if there are side costs
from doing so.

> > The redundancy occurs because of the deferred restore of the FPSIMD
> > registers for host userspace: as a result, the host FPSIMD regs are
> > either discardable (i.e., already saved) or not live at all between
> > and context switch and the next ret_to_user.
> > 
> > This means that if the vcpu run loop is preempted, then when the host
> > switches back to the run loop it is pointless to save or restore the
> > host FPSIMD state.
> > 
> > A typical sequence of events exposing this redundancy would be as
> > follows.  I assume here that there are two cpu-bound tasks A and B
> > competing for a host CPU, where A is a vcpu thread:
> > 
> >  - vcpu A is in the guest running a compute-heavy task
> >  - FPSIMD typically traps to the host before context switch
> >  X kvm saves the host FPSIMD state
> >  - kvm loads the guest FPSIMD state
> >  - vcpu A reenters the guest
> >  - host context switch IRQ preempts A back to the run loop
> >  Y kvm loads the host FPSIMD state via vcpu_put
> > 
> >  - host context switch:
> >  - TIF_FOREIGN_FPSTATE is set -> no save of user FPSIMD state
> >  - switch to B
> >  - B reaches ret_to_user
> >  Y B's user FPSIMD state is loaded: TIF_FOREIGN_FPSTATE now clear
> >  - B enters userspace
> > 
> >  - host context switch:
> >  - B enters kernel
> >  X TIF_FOREIGN_FPSTATE now set -> host saves B's FPSIMD state
> >  - switch to A -> set TIF_FOREIGN_FPSTATE for A
> >  - back to the KVM run loop
> > 
> >  - vcpu A enters guest
> >  - redo from start
> > 
> > Here, the two saves marked X are redundant with respect to each other,
> > and the two restores marked Y are redundant with respect to each other.
> > 
> 
> Right, ok, but if we have
> 
>  - ioctl(KVM_RUN)
>  - mark hardware FPSIMD register state as invalid
>  - load guest FPSIMD state
>  - enter guest
>  - exit guest
>  - save guest FPSIMD state
>  - return to user space
> 
> (I.e. we don't do any preemption in the guest)
> 
> Then we'll loose the host FPSIMD register state, potentially, right?

Yes.

However, (disregarding kernel-mode NEON) no host task's state can be
live in the FPSIMD regs other than current's.  If another context's
state is in the regs, it is either stale or a clean copy and we can
harmlessly invalidate the association with no ill effects.

The subtlety here comes from the SVE syscall ABI, which allows
current's non-FPSIMD SVE bits to be discarded across a syscall: in this
code, current _is_ in a syscall, so the fact that we can lose current's
SVE bits here is fine: TIF_SVE will have been cleared in entry.S on the
way in, and that means that SVE will trap for userspace giving a chance
to zero those regs lazily for userspace when/if they're used again.
Conversely, current's FPSIMD regs are preserved separately by KVM.

> Your original comment on this patch was that we didn't need to restore
> the host FPSIMD state in kvm_vcpu_put_sysregs, which would result in the
> scenario above.  The only way I can see this working is by making sure
> that kvm_fpsimd_flush_cpu_state() also saves the FPSIMD hardware
> register state if the state is live.
> 
> Am I still missing something?

[1] No, you're correct.  If we move the responsibility for context
handling to kvm_fpsimd_flush_cpu_state(), then we do have to put the
host context save there, which means it couldn't then be done lazily
(at least, not without more invasive changes)...

> > > > This breaks for SVE though: the high bits of the Z-registers will be
> > > > zeroed as a side effect of the FPSIMD save/restore done by KVM.
> > > > This means that if the host has state in those bits then it must
> > > > be saved before entring the guest: that's what the new
> > > > kvm_fpsimd_flush_cpu_state() hook in kvm_arch_vcpu_ioctl_run() is for.
> > > 
> > > Again, I'm confused, because to me it looks like
> > > kvm_fpsimd_flush_cpu_state() boils down to fpsimd_flush_cpu_state()
> > > which just sets a pointer to NULL, but doesn't actually save the state.
> > > 
> > > So, when is the state in the hardware registers saved to memory?
> > 
> > This _is_ quite confusing: in writing this answer I identified a bug
> > and then realised why there is no bug...
> > 
> > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > actually saved today because we explicitly don't care about preserving
> > the SVE state, because the syscall ABI throws the SVE regs away as
> > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > 
> > I think my proposal is that this hook might take on the role of
> > actually saving the state too, if we move that out of the KVM host
> > context save/restore code.
> > 
> > Perhaps we could even replace
> > 
> > 	preempt_disable();
> > 	kvm_fpsimd_flush_cpu_state();
> > 	/* ... */
> > 	preempt_enable();
> > 
> > with
> > 
> > 	kernel_neon_begin();
> > 	/* ... */
> > 	kernel_neon_end();
> 
> I'm not entirely sure where the begin and end points would be in the
> context of KVM?

Hmmm, actually there's a bug in your VHE changes now I look more
closely in this area:

You assume that the only way for the FPSIMD regs to get unexpectedly
dirtied is through a context switch, but actually this is not the case:
a softirq can use kernel-mode NEON any time that softirqs are enabled.

This means that in between kvm_arch_vcpu_load() and _put() (whether via
preempt notification or not), the guest's FPSIMD state in the regs may
be trashed by a softirq.

The simplest fix is to disable softirqs and preemption for that whole
region, but since we can stay in it indefinitely that's obviously not
the right approach.  Putting kernel_neon_begin() in _load() and
kernel_neon_end() in _put() achieves the same without disabling
softirq, but preemption is still disabled throughout, which is bad.
This effectively makes the run ioctl nonpreemptible...

A better fix would be to set the cpu's kernel_neon_busy flag, which
makes softirq code use non-NEON fallback code.

We could expose an interface from fpsimd.c to support that.

It still comes at a cost though: due to the switching from NEON to
fallback code in softirq handlers, we may get a big performance
regression in setups that rely heavily on NEON in softirq for
performance.


Alternatively we could do something like the following, but it's a
rather gross abstraction violation:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 2e43f9d..6a1ff3a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * the effect of taking the interrupt again, in SVC
 		 * mode this time.
 		 */
+		local_bh_disable();
 		local_irq_enable();
 
 		/*
+		 * If we exited due to one or mode pending interrupts, they
+		 * have now been handled.  If such an interrupt pended a
+		 * softirq, we shouldn't prevent that softirq from using
+		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
+		 * the host to manage as it likes.  We'll grab it again on the
+		 * next FPSIMD trap from the guest (if any).
+		 */
+		if (local_softirq_pending() && FPSIMD untrapped for guest) {
+			/* save vcpu FPSIMD context */
+			/* enable FPSIMD trap for guest */
+		}
+		local_bh_enable();
+
+		/*
 		 * We do local_irq_enable() before calling guest_exit() so
 		 * that if a timer interrupt hits while running the guest we
 		 * account that tick as being spent in the guest.  We enable

[...]

> > ( There is a wrinkle here: fpsimd_flush_task_state(task) should always
> > be followed by set_thread_flag(TIF_FOREIGN_FPSTATE) if task == current.
> > fpsimd_flush_cpu_state() should similarly set that flag, otherwise the
> > garbage left in the SVE bits by KVM's save/restore may spuriously
> > appear in the vcpu thread's user regs.  But since that data will be (a)
> > zeros or (b) the task's own data; and because TIF_SVE is cleared in
> > entry.S:el0_svc is a side-effect of the ioctl(KVM_RUN) syscall, I don't
> > think this matters in practice.
> > 
> > If we extend kvm_fpsimd_flush_cpu_state() to invalidate in the non-SVE
> > case too then this becomes significant and we _would_ need to clear
> > TIF_FOREIGN_FPSTATE to avoid the guests FPSIMD regs appearing in the
> 
> clear?  Wouldn't we need to set it?

Err, yes.  Just testing.

Again, kernel_mode_neon() does do that, as well as calling
fpsimd_flush_cpu_state(), showing some convergence with what kvm needs
to do here.

> > <aside>

[...]

> > </aside>
> > 
> 
> Thanks for this, it's helpful.
> 
> What's missing for my understanding is when fpsimd_save_state() gets
> called, which must be required in some cases of invalidating the
> relation, since otherwise there must be a risk of losing state?

See [1].

[...]

> > I think my suggestion:

[...]

> >  * adds 1 host context save for each preempt-or-enter-userspace ...
> >    preempt-or-enter-userspace interval of a vcpu thread during which
> >    the guest does not use FPSIMD.
> >   
> > The last bullet is the only one that can add cost.  I can imagine
> > hitting this during an I/O emulation storm.  I feel that most of the
> > rest of the time the change would be a net win, but it's hard to gauge
> > the overall impact.
> 
> It's certainly possible to have a flow where the guest kernel is not
> using FPSIMD and keeps bouncing back to host userspace which does FPSIMD
> in memcpy().  This is a pretty likely case for small disk I/O, so I'm
> not crazy about this.

Sure, understood.

> > 
> > Migrating to using the host context switch machinery as-is for
> > managing the guest FPSIMD context would allow all the redundant
> > saves/restores would be eliminated.
> > 
> > It would be a more invasive change though, and I don't think this
> > series should attempt it.
> > 
> 
> I agree that we should attempt to use the host machinery to switch
> FPSIMD state for the guest state, as long as we can keep doing that
> lazily for the guest state.  Not sure if it belongs in these patches or
> not (probably not), but I think it would be helpful if we could write up
> a patch to see how that would look.  I don't think any intermediate
> optimizations are worth it at this point.

Agreed; I think this is for the future.  If I can find a moment I may
hack on it to see how bad it looks.

But see above for my current understanding on what we need to do for
correctness today without introducing significant performance
regressions for kernel-mode NEON softirq scenarios.

Cheers
---Dave

^ permalink raw reply related	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-14 14:43                     ` Dave Martin
@ 2018-02-14 17:38                       ` Christoffer Dall
  -1 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 17:38 UTC (permalink / raw)
  To: Dave Martin
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li, kvm, Ard Biesheuvel

On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> [CC Ard, in case he has a view on how much we care about softirq NEON
> performance regressions ... and whether my suggestions make sense]
> 
> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:

[...]

> > > 
> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > actually saved today because we explicitly don't care about preserving
> > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > 
> > > I think my proposal is that this hook might take on the role of
> > > actually saving the state too, if we move that out of the KVM host
> > > context save/restore code.
> > > 
> > > Perhaps we could even replace
> > > 
> > > 	preempt_disable();
> > > 	kvm_fpsimd_flush_cpu_state();
> > > 	/* ... */
> > > 	preempt_enable();
> > > 
> > > with
> > > 
> > > 	kernel_neon_begin();
> > > 	/* ... */
> > > 	kernel_neon_end();
> > 
> > I'm not entirely sure where the begin and end points would be in the
> > context of KVM?
> 
> Hmmm, actually there's a bug in your VHE changes now I look more
> closely in this area:
> 
> You assume that the only way for the FPSIMD regs to get unexpectedly
> dirtied is through a context switch, but actually this is not the case:
> a softirq can use kernel-mode NEON any time that softirqs are enabled.
> 
> This means that in between kvm_arch_vcpu_load() and _put() (whether via
> preempt notification or not), the guest's FPSIMD state in the regs may
> be trashed by a softirq.

ouch.

> 
> The simplest fix is to disable softirqs and preemption for that whole
> region, but since we can stay in it indefinitely that's obviously not
> the right approach.  Putting kernel_neon_begin() in _load() and
> kernel_neon_end() in _put() achieves the same without disabling
> softirq, but preemption is still disabled throughout, which is bad.
> This effectively makes the run ioctl nonpreemptible...
> 
> A better fix would be to set the cpu's kernel_neon_busy flag, which
> makes softirq code use non-NEON fallback code.
> 
> We could expose an interface from fpsimd.c to support that.
> 
> It still comes at a cost though: due to the switching from NEON to
> fallback code in softirq handlers, we may get a big performance
> regression in setups that rely heavily on NEON in softirq for
> performance.
> 

I wasn't aware that softirqs would use fpsimd.

> 
> Alternatively we could do something like the following, but it's a
> rather gross abstraction violation:
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 2e43f9d..6a1ff3a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * the effect of taking the interrupt again, in SVC
>  		 * mode this time.
>  		 */
> +		local_bh_disable();
>  		local_irq_enable();
>  
>  		/*
> +		 * If we exited due to one or mode pending interrupts, they
> +		 * have now been handled.  If such an interrupt pended a
> +		 * softirq, we shouldn't prevent that softirq from using
> +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> +		 * the host to manage as it likes.  We'll grab it again on the
> +		 * next FPSIMD trap from the guest (if any).
> +		 */
> +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> +			/* save vcpu FPSIMD context */
> +			/* enable FPSIMD trap for guest */
> +		}
> +		local_bh_enable();
> +
> +		/*
>  		 * We do local_irq_enable() before calling guest_exit() so
>  		 * that if a timer interrupt hits while running the guest we
>  		 * account that tick as being spent in the guest.  We enable
> 
> [...]
> 

I can't see this working, what if an IRQ comes in and a softirq gets
pending immediately after local_bh_enable() above?

And as you say, it's really not pretty.

This is really making me think that I'll drop this part of the
optimization and when we do optimize fpsimd handling, we do it properly
by integrating it with the kernel tracking.

What do you think?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-14 17:38                       ` Christoffer Dall
  0 siblings, 0 replies; 223+ messages in thread
From: Christoffer Dall @ 2018-02-14 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> [CC Ard, in case he has a view on how much we care about softirq NEON
> performance regressions ... and whether my suggestions make sense]
> 
> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:

[...]

> > > 
> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > actually saved today because we explicitly don't care about preserving
> > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > 
> > > I think my proposal is that this hook might take on the role of
> > > actually saving the state too, if we move that out of the KVM host
> > > context save/restore code.
> > > 
> > > Perhaps we could even replace
> > > 
> > > 	preempt_disable();
> > > 	kvm_fpsimd_flush_cpu_state();
> > > 	/* ... */
> > > 	preempt_enable();
> > > 
> > > with
> > > 
> > > 	kernel_neon_begin();
> > > 	/* ... */
> > > 	kernel_neon_end();
> > 
> > I'm not entirely sure where the begin and end points would be in the
> > context of KVM?
> 
> Hmmm, actually there's a bug in your VHE changes now I look more
> closely in this area:
> 
> You assume that the only way for the FPSIMD regs to get unexpectedly
> dirtied is through a context switch, but actually this is not the case:
> a softirq can use kernel-mode NEON any time that softirqs are enabled.
> 
> This means that in between kvm_arch_vcpu_load() and _put() (whether via
> preempt notification or not), the guest's FPSIMD state in the regs may
> be trashed by a softirq.

ouch.

> 
> The simplest fix is to disable softirqs and preemption for that whole
> region, but since we can stay in it indefinitely that's obviously not
> the right approach.  Putting kernel_neon_begin() in _load() and
> kernel_neon_end() in _put() achieves the same without disabling
> softirq, but preemption is still disabled throughout, which is bad.
> This effectively makes the run ioctl nonpreemptible...
> 
> A better fix would be to set the cpu's kernel_neon_busy flag, which
> makes softirq code use non-NEON fallback code.
> 
> We could expose an interface from fpsimd.c to support that.
> 
> It still comes at a cost though: due to the switching from NEON to
> fallback code in softirq handlers, we may get a big performance
> regression in setups that rely heavily on NEON in softirq for
> performance.
> 

I wasn't aware that softirqs would use fpsimd.

> 
> Alternatively we could do something like the following, but it's a
> rather gross abstraction violation:
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 2e43f9d..6a1ff3a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * the effect of taking the interrupt again, in SVC
>  		 * mode this time.
>  		 */
> +		local_bh_disable();
>  		local_irq_enable();
>  
>  		/*
> +		 * If we exited due to one or mode pending interrupts, they
> +		 * have now been handled.  If such an interrupt pended a
> +		 * softirq, we shouldn't prevent that softirq from using
> +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> +		 * the host to manage as it likes.  We'll grab it again on the
> +		 * next FPSIMD trap from the guest (if any).
> +		 */
> +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> +			/* save vcpu FPSIMD context */
> +			/* enable FPSIMD trap for guest */
> +		}
> +		local_bh_enable();
> +
> +		/*
>  		 * We do local_irq_enable() before calling guest_exit() so
>  		 * that if a timer interrupt hits while running the guest we
>  		 * account that tick as being spent in the guest.  We enable
> 
> [...]
> 

I can't see this working, what if an IRQ comes in and a softirq gets
pending immediately after local_bh_enable() above?

And as you say, it's really not pretty.

This is really making me think that I'll drop this part of the
optimization and when we do optimize fpsimd handling, we do it properly
by integrating it with the kernel tracking.

What do you think?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-14 17:38                       ` Christoffer Dall
@ 2018-02-14 17:43                         ` Ard Biesheuvel
  -1 siblings, 0 replies; 223+ messages in thread
From: Ard Biesheuvel @ 2018-02-14 17:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Dave Martin, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li,
	KVM devel mailing list

On 14 February 2018 at 17:38, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
>> [CC Ard, in case he has a view on how much we care about softirq NEON
>> performance regressions ... and whether my suggestions make sense]
>>
>> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
>> > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
>> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
>> > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
>
> [...]
>
>> > >
>> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
>> > > actually saved today because we explicitly don't care about preserving
>> > > the SVE state, because the syscall ABI throws the SVE regs away as
>> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
>> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
>> > >
>> > > I think my proposal is that this hook might take on the role of
>> > > actually saving the state too, if we move that out of the KVM host
>> > > context save/restore code.
>> > >
>> > > Perhaps we could even replace
>> > >
>> > >   preempt_disable();
>> > >   kvm_fpsimd_flush_cpu_state();
>> > >   /* ... */
>> > >   preempt_enable();
>> > >
>> > > with
>> > >
>> > >   kernel_neon_begin();
>> > >   /* ... */
>> > >   kernel_neon_end();
>> >
>> > I'm not entirely sure where the begin and end points would be in the
>> > context of KVM?
>>
>> Hmmm, actually there's a bug in your VHE changes now I look more
>> closely in this area:
>>
>> You assume that the only way for the FPSIMD regs to get unexpectedly
>> dirtied is through a context switch, but actually this is not the case:
>> a softirq can use kernel-mode NEON any time that softirqs are enabled.
>>
>> This means that in between kvm_arch_vcpu_load() and _put() (whether via
>> preempt notification or not), the guest's FPSIMD state in the regs may
>> be trashed by a softirq.
>
> ouch.
>
>>
>> The simplest fix is to disable softirqs and preemption for that whole
>> region, but since we can stay in it indefinitely that's obviously not
>> the right approach.  Putting kernel_neon_begin() in _load() and
>> kernel_neon_end() in _put() achieves the same without disabling
>> softirq, but preemption is still disabled throughout, which is bad.
>> This effectively makes the run ioctl nonpreemptible...
>>
>> A better fix would be to set the cpu's kernel_neon_busy flag, which
>> makes softirq code use non-NEON fallback code.
>>
>> We could expose an interface from fpsimd.c to support that.
>>
>> It still comes at a cost though: due to the switching from NEON to
>> fallback code in softirq handlers, we may get a big performance
>> regression in setups that rely heavily on NEON in softirq for
>> performance.
>>
>
> I wasn't aware that softirqs would use fpsimd.
>

It is not common but it is permitted by the API, and there is mac80211
code and IPsec code that does this.

Performance penalties incurred by switching from accelerated h/w
instruction based crypto to scalar code can be as high as 20x, so we
should really avoid this if we can.

>>
>> Alternatively we could do something like the following, but it's a
>> rather gross abstraction violation:
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 2e43f9d..6a1ff3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>                * the effect of taking the interrupt again, in SVC
>>                * mode this time.
>>                */
>> +             local_bh_disable();
>>               local_irq_enable();
>>
>>               /*
>> +              * If we exited due to one or mode pending interrupts, they
>> +              * have now been handled.  If such an interrupt pended a
>> +              * softirq, we shouldn't prevent that softirq from using
>> +              * kernel-mode NEON indefinitely: instead, give FPSIMD back to
>> +              * the host to manage as it likes.  We'll grab it again on the
>> +              * next FPSIMD trap from the guest (if any).
>> +              */
>> +             if (local_softirq_pending() && FPSIMD untrapped for guest) {
>> +                     /* save vcpu FPSIMD context */
>> +                     /* enable FPSIMD trap for guest */
>> +             }
>> +             local_bh_enable();
>> +
>> +             /*
>>                * We do local_irq_enable() before calling guest_exit() so
>>                * that if a timer interrupt hits while running the guest we
>>                * account that tick as being spent in the guest.  We enable
>>
>> [...]
>>
>
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
>
> And as you say, it's really not pretty.
>
> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.
>
> What do you think?
>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-14 17:43                         ` Ard Biesheuvel
  0 siblings, 0 replies; 223+ messages in thread
From: Ard Biesheuvel @ 2018-02-14 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 February 2018 at 17:38, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
>> [CC Ard, in case he has a view on how much we care about softirq NEON
>> performance regressions ... and whether my suggestions make sense]
>>
>> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
>> > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
>> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
>> > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
>
> [...]
>
>> > >
>> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
>> > > actually saved today because we explicitly don't care about preserving
>> > > the SVE state, because the syscall ABI throws the SVE regs away as
>> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
>> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
>> > >
>> > > I think my proposal is that this hook might take on the role of
>> > > actually saving the state too, if we move that out of the KVM host
>> > > context save/restore code.
>> > >
>> > > Perhaps we could even replace
>> > >
>> > >   preempt_disable();
>> > >   kvm_fpsimd_flush_cpu_state();
>> > >   /* ... */
>> > >   preempt_enable();
>> > >
>> > > with
>> > >
>> > >   kernel_neon_begin();
>> > >   /* ... */
>> > >   kernel_neon_end();
>> >
>> > I'm not entirely sure where the begin and end points would be in the
>> > context of KVM?
>>
>> Hmmm, actually there's a bug in your VHE changes now I look more
>> closely in this area:
>>
>> You assume that the only way for the FPSIMD regs to get unexpectedly
>> dirtied is through a context switch, but actually this is not the case:
>> a softirq can use kernel-mode NEON any time that softirqs are enabled.
>>
>> This means that in between kvm_arch_vcpu_load() and _put() (whether via
>> preempt notification or not), the guest's FPSIMD state in the regs may
>> be trashed by a softirq.
>
> ouch.
>
>>
>> The simplest fix is to disable softirqs and preemption for that whole
>> region, but since we can stay in it indefinitely that's obviously not
>> the right approach.  Putting kernel_neon_begin() in _load() and
>> kernel_neon_end() in _put() achieves the same without disabling
>> softirq, but preemption is still disabled throughout, which is bad.
>> This effectively makes the run ioctl nonpreemptible...
>>
>> A better fix would be to set the cpu's kernel_neon_busy flag, which
>> makes softirq code use non-NEON fallback code.
>>
>> We could expose an interface from fpsimd.c to support that.
>>
>> It still comes at a cost though: due to the switching from NEON to
>> fallback code in softirq handlers, we may get a big performance
>> regression in setups that rely heavily on NEON in softirq for
>> performance.
>>
>
> I wasn't aware that softirqs would use fpsimd.
>

It is not common but it is permitted by the API, and there is mac80211
code and IPsec code that does this.

Performance penalties incurred by switching from accelerated h/w
instruction based crypto to scalar code can be as high as 20x, so we
should really avoid this if we can.

>>
>> Alternatively we could do something like the following, but it's a
>> rather gross abstraction violation:
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 2e43f9d..6a1ff3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>                * the effect of taking the interrupt again, in SVC
>>                * mode this time.
>>                */
>> +             local_bh_disable();
>>               local_irq_enable();
>>
>>               /*
>> +              * If we exited due to one or mode pending interrupts, they
>> +              * have now been handled.  If such an interrupt pended a
>> +              * softirq, we shouldn't prevent that softirq from using
>> +              * kernel-mode NEON indefinitely: instead, give FPSIMD back to
>> +              * the host to manage as it likes.  We'll grab it again on the
>> +              * next FPSIMD trap from the guest (if any).
>> +              */
>> +             if (local_softirq_pending() && FPSIMD untrapped for guest) {
>> +                     /* save vcpu FPSIMD context */
>> +                     /* enable FPSIMD trap for guest */
>> +             }
>> +             local_bh_enable();
>> +
>> +             /*
>>                * We do local_irq_enable() before calling guest_exit() so
>>                * that if a timer interrupt hits while running the guest we
>>                * account that tick as being spent in the guest.  We enable
>>
>> [...]
>>
>
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
>
> And as you say, it's really not pretty.
>
> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.
>
> What do you think?
>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-14 17:38                       ` Christoffer Dall
@ 2018-02-14 21:08                         ` Marc Zyngier
  -1 siblings, 0 replies; 223+ messages in thread
From: Marc Zyngier @ 2018-02-14 21:08 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Dave Martin, linux-arm-kernel, kvmarm, Shih-Wei Li, kvm, Ard Biesheuvel

On Wed, 14 Feb 2018 17:38:11 +0000,
Christoffer Dall wrote:
> 
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> > 
> > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > > 
> > > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > > actually saved today because we explicitly don't care about preserving
> > > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > > 
> > > > I think my proposal is that this hook might take on the role of
> > > > actually saving the state too, if we move that out of the KVM host
> > > > context save/restore code.
> > > > 
> > > > Perhaps we could even replace
> > > > 
> > > > 	preempt_disable();
> > > > 	kvm_fpsimd_flush_cpu_state();
> > > > 	/* ... */
> > > > 	preempt_enable();
> > > > 
> > > > with
> > > > 
> > > > 	kernel_neon_begin();
> > > > 	/* ... */
> > > > 	kernel_neon_end();
> > > 
> > > I'm not entirely sure where the begin and end points would be in the
> > > context of KVM?
> > 
> > Hmmm, actually there's a bug in your VHE changes now I look more
> > closely in this area:
> > 
> > You assume that the only way for the FPSIMD regs to get unexpectedly
> > dirtied is through a context switch, but actually this is not the case:
> > a softirq can use kernel-mode NEON any time that softirqs are enabled.
> > 
> > This means that in between kvm_arch_vcpu_load() and _put() (whether via
> > preempt notification or not), the guest's FPSIMD state in the regs may
> > be trashed by a softirq.
> 
> ouch.
> 
> > 
> > The simplest fix is to disable softirqs and preemption for that whole
> > region, but since we can stay in it indefinitely that's obviously not
> > the right approach.  Putting kernel_neon_begin() in _load() and
> > kernel_neon_end() in _put() achieves the same without disabling
> > softirq, but preemption is still disabled throughout, which is bad.
> > This effectively makes the run ioctl nonpreemptible...
> > 
> > A better fix would be to set the cpu's kernel_neon_busy flag, which
> > makes softirq code use non-NEON fallback code.
> > 
> > We could expose an interface from fpsimd.c to support that.
> > 
> > It still comes at a cost though: due to the switching from NEON to
> > fallback code in softirq handlers, we may get a big performance
> > regression in setups that rely heavily on NEON in softirq for
> > performance.
> > 
> 
> I wasn't aware that softirqs would use fpsimd.
> 
> > 
> > Alternatively we could do something like the following, but it's a
> > rather gross abstraction violation:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d..6a1ff3a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		 * the effect of taking the interrupt again, in SVC
> >  		 * mode this time.
> >  		 */
> > +		local_bh_disable();
> >  		local_irq_enable();
> >  
> >  		/*
> > +		 * If we exited due to one or mode pending interrupts, they
> > +		 * have now been handled.  If such an interrupt pended a
> > +		 * softirq, we shouldn't prevent that softirq from using
> > +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> > +		 * the host to manage as it likes.  We'll grab it again on the
> > +		 * next FPSIMD trap from the guest (if any).
> > +		 */
> > +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> > +			/* save vcpu FPSIMD context */
> > +			/* enable FPSIMD trap for guest */
> > +		}
> > +		local_bh_enable();
> > +
> > +		/*
> >  		 * We do local_irq_enable() before calling guest_exit() so
> >  		 * that if a timer interrupt hits while running the guest we
> >  		 * account that tick as being spent in the guest.  We enable
> > 
> > [...]
> > 
> 
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
> 
> And as you say, it's really not pretty.
> 
> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.
> 
> What do you think?

[catching up with the discussion]

I think it makes sense. I'd be worried if we'd merge something that we
know to be sub-par, and that could introduce unexpected performance
regressions. It looks like we have a slightly more long-term plan to
address this in a way that integrates with the rest of the kernel
infrastructure, so let's take this opportunity.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-14 21:08                         ` Marc Zyngier
  0 siblings, 0 replies; 223+ messages in thread
From: Marc Zyngier @ 2018-02-14 21:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 14 Feb 2018 17:38:11 +0000,
Christoffer Dall wrote:
> 
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> > 
> > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > > 
> > > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > > actually saved today because we explicitly don't care about preserving
> > > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > > 
> > > > I think my proposal is that this hook might take on the role of
> > > > actually saving the state too, if we move that out of the KVM host
> > > > context save/restore code.
> > > > 
> > > > Perhaps we could even replace
> > > > 
> > > > 	preempt_disable();
> > > > 	kvm_fpsimd_flush_cpu_state();
> > > > 	/* ... */
> > > > 	preempt_enable();
> > > > 
> > > > with
> > > > 
> > > > 	kernel_neon_begin();
> > > > 	/* ... */
> > > > 	kernel_neon_end();
> > > 
> > > I'm not entirely sure where the begin and end points would be in the
> > > context of KVM?
> > 
> > Hmmm, actually there's a bug in your VHE changes now I look more
> > closely in this area:
> > 
> > You assume that the only way for the FPSIMD regs to get unexpectedly
> > dirtied is through a context switch, but actually this is not the case:
> > a softirq can use kernel-mode NEON any time that softirqs are enabled.
> > 
> > This means that in between kvm_arch_vcpu_load() and _put() (whether via
> > preempt notification or not), the guest's FPSIMD state in the regs may
> > be trashed by a softirq.
> 
> ouch.
> 
> > 
> > The simplest fix is to disable softirqs and preemption for that whole
> > region, but since we can stay in it indefinitely that's obviously not
> > the right approach.  Putting kernel_neon_begin() in _load() and
> > kernel_neon_end() in _put() achieves the same without disabling
> > softirq, but preemption is still disabled throughout, which is bad.
> > This effectively makes the run ioctl nonpreemptible...
> > 
> > A better fix would be to set the cpu's kernel_neon_busy flag, which
> > makes softirq code use non-NEON fallback code.
> > 
> > We could expose an interface from fpsimd.c to support that.
> > 
> > It still comes at a cost though: due to the switching from NEON to
> > fallback code in softirq handlers, we may get a big performance
> > regression in setups that rely heavily on NEON in softirq for
> > performance.
> > 
> 
> I wasn't aware that softirqs would use fpsimd.
> 
> > 
> > Alternatively we could do something like the following, but it's a
> > rather gross abstraction violation:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d..6a1ff3a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		 * the effect of taking the interrupt again, in SVC
> >  		 * mode this time.
> >  		 */
> > +		local_bh_disable();
> >  		local_irq_enable();
> >  
> >  		/*
> > +		 * If we exited due to one or mode pending interrupts, they
> > +		 * have now been handled.  If such an interrupt pended a
> > +		 * softirq, we shouldn't prevent that softirq from using
> > +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> > +		 * the host to manage as it likes.  We'll grab it again on the
> > +		 * next FPSIMD trap from the guest (if any).
> > +		 */
> > +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> > +			/* save vcpu FPSIMD context */
> > +			/* enable FPSIMD trap for guest */
> > +		}
> > +		local_bh_enable();
> > +
> > +		/*
> >  		 * We do local_irq_enable() before calling guest_exit() so
> >  		 * that if a timer interrupt hits while running the guest we
> >  		 * account that tick as being spent in the guest.  We enable
> > 
> > [...]
> > 
> 
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
> 
> And as you say, it's really not pretty.
> 
> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.
> 
> What do you think?

[catching up with the discussion]

I think it makes sense. I'd be worried if we'd merge something that we
know to be sub-par, and that could introduce unexpected performance
regressions. It looks like we have a slightly more long-term plan to
address this in a way that integrates with the rest of the kernel
infrastructure, so let's take this opportunity.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
  2018-02-14 17:38                       ` Christoffer Dall
@ 2018-02-15  9:51                         ` Dave Martin
  -1 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-15  9:51 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Ard Biesheuvel, Marc Zyngier, linux-arm-kernel, kvmarm, Shih-Wei Li

On Wed, Feb 14, 2018 at 06:38:11PM +0100, Christoffer Dall wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> > 
> > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > > 
> > > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > > actually saved today because we explicitly don't care about preserving
> > > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > > 
> > > > I think my proposal is that this hook might take on the role of
> > > > actually saving the state too, if we move that out of the KVM host
> > > > context save/restore code.
> > > > 
> > > > Perhaps we could even replace
> > > > 
> > > > 	preempt_disable();
> > > > 	kvm_fpsimd_flush_cpu_state();
> > > > 	/* ... */
> > > > 	preempt_enable();
> > > > 
> > > > with
> > > > 
> > > > 	kernel_neon_begin();
> > > > 	/* ... */
> > > > 	kernel_neon_end();
> > > 
> > > I'm not entirely sure where the begin and end points would be in the
> > > context of KVM?
> > 
> > Hmmm, actually there's a bug in your VHE changes now I look more
> > closely in this area:
> > 
> > You assume that the only way for the FPSIMD regs to get unexpectedly
> > dirtied is through a context switch, but actually this is not the case:
> > a softirq can use kernel-mode NEON any time that softirqs are enabled.
> > 
> > This means that in between kvm_arch_vcpu_load() and _put() (whether via
> > preempt notification or not), the guest's FPSIMD state in the regs may
> > be trashed by a softirq.
> 
> ouch.
> 
> > 
> > The simplest fix is to disable softirqs and preemption for that whole
> > region, but since we can stay in it indefinitely that's obviously not
> > the right approach.  Putting kernel_neon_begin() in _load() and
> > kernel_neon_end() in _put() achieves the same without disabling
> > softirq, but preemption is still disabled throughout, which is bad.
> > This effectively makes the run ioctl nonpreemptible...
> > 
> > A better fix would be to set the cpu's kernel_neon_busy flag, which
> > makes softirq code use non-NEON fallback code.
> > 
> > We could expose an interface from fpsimd.c to support that.
> > 
> > It still comes at a cost though: due to the switching from NEON to
> > fallback code in softirq handlers, we may get a big performance
> > regression in setups that rely heavily on NEON in softirq for
> > performance.
> > 
> 
> I wasn't aware that softirqs would use fpsimd.
> 
> > 
> > Alternatively we could do something like the following, but it's a
> > rather gross abstraction violation:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d..6a1ff3a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		 * the effect of taking the interrupt again, in SVC
> >  		 * mode this time.
> >  		 */
> > +		local_bh_disable();
> >  		local_irq_enable();
> >  
> >  		/*
> > +		 * If we exited due to one or mode pending interrupts, they
> > +		 * have now been handled.  If such an interrupt pended a
> > +		 * softirq, we shouldn't prevent that softirq from using
> > +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> > +		 * the host to manage as it likes.  We'll grab it again on the
> > +		 * next FPSIMD trap from the guest (if any).
> > +		 */
> > +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> > +			/* save vcpu FPSIMD context */
> > +			/* enable FPSIMD trap for guest */
> > +		}
> > +		local_bh_enable();
> > +
> > +		/*
> >  		 * We do local_irq_enable() before calling guest_exit() so
> >  		 * that if a timer interrupt hits while running the guest we
> >  		 * account that tick as being spent in the guest.  We enable
> > 
> > [...]
> > 
> 
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?

Sorry, I missed a crucial bit of information here.

For context: here's the remainder of my argument.  This is not a
recommendation...


--8<--

We can inhibit softirqs from trashing the FPSIMD regs by setting the
per-cpu kernel_neon_busy flag: that's forces softirq code to use
non-NEON fallback code without actually disabling softirq.

I'd come up with a local hack

 * kernel_neon_grab();

	to set the flag, which would happen in vcpu_load().

 * kernel_neon_ungrab();

	to clear the flag, which would happen as above and in
	vcpu_put().

It would be up to the caller to ensure that preemption cannot occur
between those calls (satisfied by use of a preempt notifier here), and
to save the host context when needed.

This would bound the kernel-mode NEON blackout to the time KVM spends
in the host kernel only: the above conditional relinquishing of the
FPSIMD regs ensures that a softirq trigger event occuring during the
(unbounded) guest execution time _does_ get to use NEON.

-->8--

> And as you say, it's really not pretty.

Agreed!

> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.

Since I will be hacking at this area as part of the SVE KVM support
anyway, I will sooner or later end up working on it -- at that point it
will likely be worth unifying the two mechanisms, at least for the VHE
case (SVE architecturally required v8.2, so VHE can be assumed in that
case).

It would be interesting to know what the numbers look like without
the FPSIMD optimisation.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put
@ 2018-02-15  9:51                         ` Dave Martin
  0 siblings, 0 replies; 223+ messages in thread
From: Dave Martin @ 2018-02-15  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 14, 2018 at 06:38:11PM +0100, Christoffer Dall wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +0000, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> > 
> > On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
> > > On Tue, Feb 13, 2018 at 02:08:47PM +0000, Dave Martin wrote:
> > > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
> > > > > On Fri, Feb 09, 2018 at 03:59:30PM +0000, Dave Martin wrote:
> 
> [...]
> 
> > > > 
> > > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
> > > > actually saved today because we explicitly don't care about preserving
> > > > the SVE state, because the syscall ABI throws the SVE regs away as
> > > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
> > > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
> > > > 
> > > > I think my proposal is that this hook might take on the role of
> > > > actually saving the state too, if we move that out of the KVM host
> > > > context save/restore code.
> > > > 
> > > > Perhaps we could even replace
> > > > 
> > > > 	preempt_disable();
> > > > 	kvm_fpsimd_flush_cpu_state();
> > > > 	/* ... */
> > > > 	preempt_enable();
> > > > 
> > > > with
> > > > 
> > > > 	kernel_neon_begin();
> > > > 	/* ... */
> > > > 	kernel_neon_end();
> > > 
> > > I'm not entirely sure where the begin and end points would be in the
> > > context of KVM?
> > 
> > Hmmm, actually there's a bug in your VHE changes now I look more
> > closely in this area:
> > 
> > You assume that the only way for the FPSIMD regs to get unexpectedly
> > dirtied is through a context switch, but actually this is not the case:
> > a softirq can use kernel-mode NEON any time that softirqs are enabled.
> > 
> > This means that in between kvm_arch_vcpu_load() and _put() (whether via
> > preempt notification or not), the guest's FPSIMD state in the regs may
> > be trashed by a softirq.
> 
> ouch.
> 
> > 
> > The simplest fix is to disable softirqs and preemption for that whole
> > region, but since we can stay in it indefinitely that's obviously not
> > the right approach.  Putting kernel_neon_begin() in _load() and
> > kernel_neon_end() in _put() achieves the same without disabling
> > softirq, but preemption is still disabled throughout, which is bad.
> > This effectively makes the run ioctl nonpreemptible...
> > 
> > A better fix would be to set the cpu's kernel_neon_busy flag, which
> > makes softirq code use non-NEON fallback code.
> > 
> > We could expose an interface from fpsimd.c to support that.
> > 
> > It still comes at a cost though: due to the switching from NEON to
> > fallback code in softirq handlers, we may get a big performance
> > regression in setups that rely heavily on NEON in softirq for
> > performance.
> > 
> 
> I wasn't aware that softirqs would use fpsimd.
> 
> > 
> > Alternatively we could do something like the following, but it's a
> > rather gross abstraction violation:
> > 
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 2e43f9d..6a1ff3a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		 * the effect of taking the interrupt again, in SVC
> >  		 * mode this time.
> >  		 */
> > +		local_bh_disable();
> >  		local_irq_enable();
> >  
> >  		/*
> > +		 * If we exited due to one or mode pending interrupts, they
> > +		 * have now been handled.  If such an interrupt pended a
> > +		 * softirq, we shouldn't prevent that softirq from using
> > +		 * kernel-mode NEON indefinitely: instead, give FPSIMD back to
> > +		 * the host to manage as it likes.  We'll grab it again on the
> > +		 * next FPSIMD trap from the guest (if any).
> > +		 */
> > +		if (local_softirq_pending() && FPSIMD untrapped for guest) {
> > +			/* save vcpu FPSIMD context */
> > +			/* enable FPSIMD trap for guest */
> > +		}
> > +		local_bh_enable();
> > +
> > +		/*
> >  		 * We do local_irq_enable() before calling guest_exit() so
> >  		 * that if a timer interrupt hits while running the guest we
> >  		 * account that tick as being spent in the guest.  We enable
> > 
> > [...]
> > 
> 
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?

Sorry, I missed a crucial bit of information here.

For context: here's the remainder of my argument.  This is not a
recommendation...


--8<--

We can inhibit softirqs from trashing the FPSIMD regs by setting the
per-cpu kernel_neon_busy flag: that's forces softirq code to use
non-NEON fallback code without actually disabling softirq.

I'd come up with a local hack

 * kernel_neon_grab();

	to set the flag, which would happen in vcpu_load().

 * kernel_neon_ungrab();

	to clear the flag, which would happen as above and in
	vcpu_put().

It would be up to the caller to ensure that preemption cannot occur
between those calls (satisfied by use of a preempt notifier here), and
to save the host context when needed.

This would bound the kernel-mode NEON blackout to the time KVM spends
in the host kernel only: the above conditional relinquishing of the
FPSIMD regs ensures that a softirq trigger event occuring during the
(unbounded) guest execution time _does_ get to use NEON.

-->8--

> And as you say, it's really not pretty.

Agreed!

> This is really making me think that I'll drop this part of the
> optimization and when we do optimize fpsimd handling, we do it properly
> by integrating it with the kernel tracking.

Since I will be hacking at this area as part of the SVE KVM support
anyway, I will sooner or later end up working on it -- at that point it
will likely be worth unifying the two mechanisms, at least for the VHE
case (SVE architecturally required v8.2, so VHE can be assumed in that
case).

It would be interesting to know what the numbers look like without
the FPSIMD optimisation.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 223+ messages in thread

* Re: [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
  2018-02-13 22:31       ` Christoffer Dall
@ 2018-02-19 16:30         ` Julien Grall
  -1 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-19 16:30 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, Shih-Wei Li, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,

Sorry for the late reply.

On 13/02/18 22:31, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
>> Hi Christoffer,
>>
>> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
>>> The VHE switch function calls __timer_enable_traps and
>>> __timer_disable_traps which don't do anything on VHE systems.
>>> Therefore, simply remove these calls from the VHE switch function and
>>> make the functions non-conditional as they are now only called from the
>>> non-VHE switch path.
>>>
>>> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>   arch/arm64/kvm/hyp/switch.c |  2 --
>>>   virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
>>>   2 files changed, 22 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>> index 9aadef6966bf..6175fcb33ed2 100644
>>> --- a/arch/arm64/kvm/hyp/switch.c
>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>> @@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>   	__activate_vm(vcpu->kvm);
>>>   	__vgic_restore_state(vcpu);
>>> -	__timer_enable_traps(vcpu);
>>>   	/*
>>>   	 * We must restore the 32-bit state before the sysregs, thanks
>>> @@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>   	__sysreg_save_guest_state(guest_ctxt);
>>>   	__sysreg32_save_state(vcpu);
>>> -	__timer_disable_traps(vcpu);
>>>   	__vgic_save_state(vcpu);
>>>   	__deactivate_traps(vcpu);
>>> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
>>> index f24404b3c8df..77754a62eb0c 100644
>>> --- a/virt/kvm/arm/hyp/timer-sr.c
>>> +++ b/virt/kvm/arm/hyp/timer-sr.c
>>> @@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>>>   	write_sysreg(cntvoff, cntvoff_el2);
>>>   }
>>> +/*
>>> + * Should only be called on non-VHE systems.
>>> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
>>> + */
>>>   void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
>>
>> Would it be worth to suffix the function with nvhe? So it would be clear
>> that it should not be called for VHE system?
>>
> Actually, I decided against this, because it's also called from the
> 32-bit code and it looks a little strange there, and it's not like we
> have an equivalent _vhe version.

The main goal was to provide a naming that would prevent someone to use 
it in VHE case. This would have also  been inline with other patches 
where you rename some helpers to nvhe/vhe even in arm32 code.

Anyway, I guess the reviewers will be careful enough to spot that :).

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

* [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch
@ 2018-02-19 16:30         ` Julien Grall
  0 siblings, 0 replies; 223+ messages in thread
From: Julien Grall @ 2018-02-19 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

Sorry for the late reply.

On 13/02/18 22:31, Christoffer Dall wrote:
> On Fri, Feb 09, 2018 at 05:53:43PM +0000, Julien Grall wrote:
>> Hi Christoffer,
>>
>> On 01/12/2018 12:07 PM, Christoffer Dall wrote:
>>> The VHE switch function calls __timer_enable_traps and
>>> __timer_disable_traps which don't do anything on VHE systems.
>>> Therefore, simply remove these calls from the VHE switch function and
>>> make the functions non-conditional as they are now only called from the
>>> non-VHE switch path.
>>>
>>> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>   arch/arm64/kvm/hyp/switch.c |  2 --
>>>   virt/kvm/arm/hyp/timer-sr.c | 44 ++++++++++++++++++++++----------------------
>>>   2 files changed, 22 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>> index 9aadef6966bf..6175fcb33ed2 100644
>>> --- a/arch/arm64/kvm/hyp/switch.c
>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>> @@ -354,7 +354,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>   	__activate_vm(vcpu->kvm);
>>>   	__vgic_restore_state(vcpu);
>>> -	__timer_enable_traps(vcpu);
>>>   	/*
>>>   	 * We must restore the 32-bit state before the sysregs, thanks
>>> @@ -373,7 +372,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>   	__sysreg_save_guest_state(guest_ctxt);
>>>   	__sysreg32_save_state(vcpu);
>>> -	__timer_disable_traps(vcpu);
>>>   	__vgic_save_state(vcpu);
>>>   	__deactivate_traps(vcpu);
>>> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
>>> index f24404b3c8df..77754a62eb0c 100644
>>> --- a/virt/kvm/arm/hyp/timer-sr.c
>>> +++ b/virt/kvm/arm/hyp/timer-sr.c
>>> @@ -27,34 +27,34 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>>>   	write_sysreg(cntvoff, cntvoff_el2);
>>>   }
>>> +/*
>>> + * Should only be called on non-VHE systems.
>>> + * VHE systems use EL2 timers and configure EL1 timers in kvm_timer_init_vhe().
>>> + */
>>>   void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
>>
>> Would it be worth to suffix the function with nvhe? So it would be clear
>> that it should not be called for VHE system?
>>
> Actually, I decided against this, because it's also called from the
> 32-bit code and it looks a little strange there, and it's not like we
> have an equivalent _vhe version.

The main goal was to provide a naming that would prevent someone to use 
it in VHE case. This would have also  been inline with other patches 
where you rename some helpers to nvhe/vhe even in arm32 code.

Anyway, I guess the reviewers will be careful enough to spot that :).

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 223+ messages in thread

end of thread, other threads:[~2018-02-19 16:30 UTC | newest]

Thread overview: 223+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12 12:07 [PATCH v3 00/41] Optimize KVM/ARM for VHE systems Christoffer Dall
2018-01-12 12:07 ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-05 12:32   ` Julien Grall
2018-02-05 12:32     ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-05 14:34   ` Julien Grall
2018-02-05 14:34     ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-05 17:14   ` Julien Grall
2018-02-05 17:14     ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-05 18:04   ` Julien Grall
2018-02-05 18:04     ` Julien Grall
2018-02-05 18:10     ` Julien Grall
2018-02-05 18:10       ` Julien Grall
2018-02-08 13:24     ` Christoffer Dall
2018-02-08 13:24       ` Christoffer Dall
2018-02-09 10:55       ` Julien Grall
2018-02-09 10:55         ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-09 11:38   ` Julien Grall
2018-02-09 11:38     ` Julien Grall
2018-02-13 21:47     ` Christoffer Dall
2018-02-13 21:47       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 06/41] KVM: arm/arm64: Get rid of vcpu->arch.irq_lines Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 07/41] KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-17 14:44   ` Julien Thierry
2018-01-17 14:44     ` Julien Thierry
2018-01-18 12:57     ` Christoffer Dall
2018-01-18 12:57       ` Christoffer Dall
2018-02-09 12:31   ` Julien Grall
2018-02-09 12:31     ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-22 17:33   ` Dave Martin
2018-01-22 17:33     ` Dave Martin
2018-01-25 19:46     ` Christoffer Dall
2018-01-25 19:46       ` Christoffer Dall
2018-02-07 16:49       ` Dave Martin
2018-02-07 16:49         ` Dave Martin
2018-02-07 17:56         ` Christoffer Dall
2018-02-07 17:56           ` Christoffer Dall
2018-02-09 15:59           ` Dave Martin
2018-02-09 15:59             ` Dave Martin
2018-02-13  8:51             ` Christoffer Dall
2018-02-13  8:51               ` Christoffer Dall
2018-02-13 14:08               ` Dave Martin
2018-02-13 14:08                 ` Dave Martin
2018-02-14 10:15                 ` Christoffer Dall
2018-02-14 10:15                   ` Christoffer Dall
2018-02-14 14:43                   ` Dave Martin
2018-02-14 14:43                     ` Dave Martin
2018-02-14 17:38                     ` Christoffer Dall
2018-02-14 17:38                       ` Christoffer Dall
2018-02-14 17:43                       ` Ard Biesheuvel
2018-02-14 17:43                         ` Ard Biesheuvel
2018-02-14 21:08                       ` Marc Zyngier
2018-02-14 21:08                         ` Marc Zyngier
2018-02-15  9:51                       ` Dave Martin
2018-02-15  9:51                         ` Dave Martin
2018-02-09 15:26   ` Julien Grall
2018-02-09 15:26     ` Julien Grall
2018-02-13  8:52     ` Christoffer Dall
2018-02-13  8:52       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-17 15:11   ` Julien Thierry
2018-01-17 15:11     ` Julien Thierry
2018-01-12 12:07 ` [PATCH v3 11/41] KVM: arm64: Slightly improve debug save/restore functions Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 12/41] KVM: arm64: Improve debug register save/restore flow Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-17 15:35   ` Julien Thierry
2018-01-12 12:07 ` [PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-24 16:13   ` Dave Martin
2018-01-24 16:13     ` Dave Martin
2018-01-25  8:45     ` Christoffer Dall
2018-01-25  8:45       ` Christoffer Dall
2018-02-09 17:34   ` Julien Grall
2018-02-09 17:34     ` Julien Grall
2018-02-13  8:52     ` Christoffer Dall
2018-02-13  8:52       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-24 16:24   ` Dave Martin
2018-01-24 16:24     ` Dave Martin
2018-01-25 19:48     ` Christoffer Dall
2018-01-25 19:48       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 16/41] KVM: arm64: Don't deactivate VM on VHE systems Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-09 17:53   ` Julien Grall
2018-02-09 17:53     ` Julien Grall
2018-02-13  8:53     ` Christoffer Dall
2018-02-13  8:53       ` Christoffer Dall
2018-02-13 22:31     ` Christoffer Dall
2018-02-13 22:31       ` Christoffer Dall
2018-02-19 16:30       ` Julien Grall
2018-02-19 16:30         ` Julien Grall
2018-01-12 12:07 ` [PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-09 18:50   ` Julien Grall
2018-02-09 18:50     ` Julien Grall
2018-02-14 11:22     ` Christoffer Dall
2018-02-14 11:22       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 19/41] KVM: arm64: Rewrite sysreg alternatives to static keys Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 20/41] KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore functions Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 21/41] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 22/41] KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 23/41] KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 24/41] KVM: arm64: Change 32-bit handling of VM system registers Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 25/41] KVM: arm64: Rewrite system register accessors to read/write functions Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-17 17:52   ` Julien Thierry
2018-01-17 17:52     ` Julien Thierry
2018-01-18 13:08     ` Christoffer Dall
2018-01-18 13:08       ` Christoffer Dall
2018-01-18 13:39       ` Julien Thierry
2018-01-18 13:39         ` Julien Thierry
2018-01-23 16:04   ` Dave Martin
2018-01-23 16:04     ` Dave Martin
2018-01-25 19:54     ` Christoffer Dall
2018-01-25 19:54       ` Christoffer Dall
2018-02-09 16:17       ` Dave Martin
2018-02-09 16:17         ` Dave Martin
2018-02-13  8:55         ` Christoffer Dall
2018-02-13  8:55           ` Christoffer Dall
2018-02-13 14:27           ` Dave Martin
2018-02-13 14:27             ` Dave Martin
2018-01-12 12:07 ` [PATCH v3 27/41] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1 Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 28/41] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1 Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 29/41] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-17 18:22   ` Julien Thierry
2018-01-17 18:22     ` Julien Thierry
2018-01-18 13:12     ` Christoffer Dall
2018-01-18 13:12       ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 31/41] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 32/41] KVM: arm64: Move common VHE/non-VHE trap config in separate functions Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-18  9:31   ` Julien Thierry
2018-01-18  9:31     ` Julien Thierry
2018-01-31 12:17   ` Tomasz Nowicki
2018-01-31 12:17     ` Tomasz Nowicki
2018-02-05 10:06     ` Christoffer Dall
2018-02-05 10:06       ` Christoffer Dall
2018-01-31 12:24   ` Tomasz Nowicki
2018-01-31 12:24     ` Tomasz Nowicki
2018-01-12 12:07 ` [PATCH v3 34/41] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 35/41] KVM: arm64: Separate activate_traps and deactive_traps for VHE and non-VHE Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 36/41] KVM: arm/arm64: Get rid of vgic_elrsr Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 37/41] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 38/41] KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64 Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 39/41] KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 40/41] KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-01-12 12:07 ` [PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs Christoffer Dall
2018-01-12 12:07   ` Christoffer Dall
2018-02-05 13:29   ` Tomasz Nowicki
2018-02-05 13:29     ` Tomasz Nowicki
2018-02-08 15:48     ` Christoffer Dall
2018-02-08 15:48       ` Christoffer Dall
2018-01-15 14:14 ` [PATCH v3 00/41] Optimize KVM/ARM for VHE systems Yury Norov
2018-01-15 14:14   ` Yury Norov
2018-01-15 15:50   ` Christoffer Dall
2018-01-15 15:50     ` Christoffer Dall
2018-01-17  8:34     ` Yury Norov
2018-01-17  8:34       ` Yury Norov
2018-01-17 10:48       ` Christoffer Dall
2018-01-17 10:48         ` Christoffer Dall
2018-01-18 11:16   ` Christoffer Dall
2018-01-18 11:16     ` Christoffer Dall
2018-01-18 12:18     ` Yury Norov
2018-01-18 12:18       ` Yury Norov
2018-01-18 13:32       ` Christoffer Dall
2018-01-18 13:32         ` Christoffer Dall
2018-01-22 13:40   ` Tomasz Nowicki
2018-01-22 13:40     ` Tomasz Nowicki
2018-02-01 13:57 ` Tomasz Nowicki
2018-02-01 13:57   ` Tomasz Nowicki
2018-02-01 16:15   ` Yury Norov
2018-02-01 16:15     ` Yury Norov
2018-02-02 10:05     ` Tomasz Nowicki
2018-02-02 10:05       ` Tomasz Nowicki
2018-02-02 10:07   ` Tomasz Nowicki
2018-02-02 10:07     ` Tomasz Nowicki
2018-02-08 15:47   ` Christoffer Dall
2018-02-08 15:47     ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.