kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2
@ 2020-11-02 16:40 Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap Marc Zyngier
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

As we progress towards being able to keep the guest state private to
the nVHE hypervisor, this series aims at moving anything that touches
the registers involved into an exception to EL2.

The general idea is that any update to these registers is driven by a
set of flags passed from EL1 to EL2, and EL2 will deal with the
register update itself, removing the need for EL1 to see the guest
state. It also results in a bunch of cleanup, mostly in the 32bit
department (negative diffstat, yay!).

Of course, none of that has any real effect on security yet. It is
only once we start having a private VCPU structure at EL2 that we can
enforce the isolation. Similarly, there is no policy enforcement, and
a malicious EL1 can still inject exceptions at random points. It can
also give bogus ESR values to the guest. Baby steps.

        M.

* From v1 [1]
  - Fix __kvm_skip_instr() unexpected recursion
  - Fix HVC fixup updating the in-memory state instead of the guest's
  - Dropped facilities for IRQ/FIQ/SError exception injection
  - Simplified VHE/nVHE differences in exception injection
  - Moved AArch32 exception injection over to AArch64 sysregs
  - Use compat_lr_* instead of hardcoded registers
  - Schpelling fyxes

[1] https://lore.kernel.org/r/20201026133450.73304-1-maz@kernel.org

Marc Zyngier (11):
  KVM: arm64: Don't adjust PC on SError during SMC trap
  KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32()
  KVM: arm64: Make kvm_skip_instr() and co private to HYP
  KVM: arm64: Move PC rollback on SError to HYP
  KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h
  KVM: arm64: Add basic hooks for injecting exceptions from EL2
  KVM: arm64: Inject AArch64 exceptions from HYP
  KVM: arm64: Inject AArch32 exceptions from HYP
  KVM: arm64: Remove SPSR manipulation primitives
  KVM: arm64: Consolidate exception injection
  KVM: arm64: Get rid of the AArch32 register mapping code

 arch/arm64/include/asm/kvm_emulate.h       |  70 +----
 arch/arm64/include/asm/kvm_host.h          | 118 +++++++-
 arch/arm64/kvm/Makefile                    |   4 +-
 arch/arm64/kvm/aarch32.c                   | 232 ---------------
 arch/arm64/kvm/guest.c                     |  28 +-
 arch/arm64/kvm/handle_exit.c               |  23 +-
 arch/arm64/kvm/hyp/aarch32.c               |   4 +-
 arch/arm64/kvm/hyp/exception.c             | 331 +++++++++++++++++++++
 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h |  62 ++++
 arch/arm64/kvm/hyp/include/hyp/switch.h    |  17 ++
 arch/arm64/kvm/hyp/nvhe/Makefile           |   2 +-
 arch/arm64/kvm/hyp/nvhe/switch.c           |   3 +
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c   |   2 +
 arch/arm64/kvm/hyp/vgic-v3-sr.c            |   2 +
 arch/arm64/kvm/hyp/vhe/Makefile            |   2 +-
 arch/arm64/kvm/hyp/vhe/switch.c            |   3 +
 arch/arm64/kvm/inject_fault.c              | 189 +++++-------
 arch/arm64/kvm/mmio.c                      |   2 +-
 arch/arm64/kvm/mmu.c                       |   2 +-
 arch/arm64/kvm/regmap.c                    | 224 --------------
 arch/arm64/kvm/sys_regs.c                  |  83 +-----
 21 files changed, 666 insertions(+), 737 deletions(-)
 delete mode 100644 arch/arm64/kvm/aarch32.c
 create mode 100644 arch/arm64/kvm/hyp/exception.c
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
 delete mode 100644 arch/arm64/kvm/regmap.c

-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 02/11] KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32() Marc Zyngier
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

On SMC trap, the prefered return address is set to that of the SMC
instruction itself. It is thus wrong to try and roll it back when
an SError occurs while trapping on SMC. It is still necessary on
HVC though, as HVC doesn't cause a trap, and sets ELR to returning
*after* the HVC.

It also became apparent that there is no 16bit encoding for an AArch32
HVC instruction, meaning that the displacement is always 4 bytes,
no matter what the ISA is. Take this opportunity to simplify it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 5d690d60ccad..79a720657c47 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -245,15 +245,15 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 		u8 esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
 
 		/*
-		 * HVC/SMC already have an adjusted PC, which we need
-		 * to correct in order to return to after having
-		 * injected the SError.
+		 * HVC already have an adjusted PC, which we need to
+		 * correct in order to return to after having injected
+		 * the SError.
+		 *
+		 * SMC, on the other hand, is *trapped*, meaning its
+		 * preferred return address is the SMC itself.
 		 */
-		if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64 ||
-		    esr_ec == ESR_ELx_EC_SMC32 || esr_ec == ESR_ELx_EC_SMC64) {
-			u32 adj =  kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2;
-			*vcpu_pc(vcpu) -= adj;
-		}
+		if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
+			*vcpu_pc(vcpu) -= 4;
 
 		return 1;
 	}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 02/11] KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32()
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP Marc Zyngier
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

There is no need to feed the result of kvm_vcpu_trap_il_is32bit()
to kvm_skip_instr(), as only AArch32 has a variable length ISA, and
this helper can equally be called from kvm_skip_instr32(), reducing
the complexity at all the call sites.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 8 ++++----
 arch/arm64/kvm/handle_exit.c         | 6 +++---
 arch/arm64/kvm/hyp/aarch32.c         | 4 ++--
 arch/arm64/kvm/mmio.c                | 2 +-
 arch/arm64/kvm/mmu.c                 | 2 +-
 arch/arm64/kvm/sys_regs.c            | 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5ef2669ccd6c..0864f425547d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -26,7 +26,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
 void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
 
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
-void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_vabt(struct kvm_vcpu *vcpu);
@@ -472,10 +472,10 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
-static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
+static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu)) {
-		kvm_skip_instr32(vcpu, is_wide_instr);
+		kvm_skip_instr32(vcpu);
 	} else {
 		*vcpu_pc(vcpu) += 4;
 		*vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
@@ -494,7 +494,7 @@ static __always_inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
 	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
 	vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
 
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu);
 
 	write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
 	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 79a720657c47..30bf8e22df54 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -61,7 +61,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 	 * otherwise return to the same address...
 	 */
 	vcpu_set_reg(vcpu, 0, ~0UL);
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu);
 	return 1;
 }
 
@@ -100,7 +100,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu);
 
 	return 1;
 }
@@ -221,7 +221,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 	 * that fail their condition code check"
 	 */
 	if (!kvm_condition_valid(vcpu)) {
-		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		kvm_skip_instr(vcpu);
 		handled = 1;
 	} else {
 		exit_handle_fn exit_handler;
diff --git a/arch/arm64/kvm/hyp/aarch32.c b/arch/arm64/kvm/hyp/aarch32.c
index ae56d8a4b382..f98cbe2626a1 100644
--- a/arch/arm64/kvm/hyp/aarch32.c
+++ b/arch/arm64/kvm/hyp/aarch32.c
@@ -123,13 +123,13 @@ static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
  * kvm_skip_instr - skip a trapped instruction and proceed to the next
  * @vcpu: The vcpu pointer
  */
-void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr)
+void kvm_skip_instr32(struct kvm_vcpu *vcpu)
 {
 	u32 pc = *vcpu_pc(vcpu);
 	bool is_thumb;
 
 	is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_AA32_T_BIT);
-	if (is_thumb && !is_wide_instr)
+	if (is_thumb && !kvm_vcpu_trap_il_is32bit(vcpu))
 		pc += 2;
 	else
 		pc += 4;
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 6a2826f1bf5e..7e8eb32ae7d2 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
 	 * The MMIO instruction is emulated and should not be re-executed
 	 * in the guest.
 	 */
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 57972bdb213a..0bec07cf8d06 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1014,7 +1014,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * cautious, and skip the instruction.
 		 */
 		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
-			kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+			kvm_skip_instr(vcpu);
 			ret = 1;
 			goto out_unlock;
 		}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index fb12d3ef423a..1232a814ca7f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2199,7 +2199,7 @@ static void perform_access(struct kvm_vcpu *vcpu,
 
 	/* Skip instruction if instructed so */
 	if (likely(r->access(vcpu, params, r)))
-		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		kvm_skip_instr(vcpu);
 }
 
 /*
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 02/11] KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32() Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2021-05-05 14:23   ` Zenghui Yu
  2020-11-02 16:40 ` [PATCH v2 04/11] KVM: arm64: Move PC rollback on SError " Marc Zyngier
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

In an effort to remove the vcpu PC manipulations from EL1 on nVHE
systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
to increment PC post emulation is now signalled via a flag in the
vcpu structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h       | 27 +----------
 arch/arm64/include/asm/kvm_host.h          |  1 +
 arch/arm64/kvm/handle_exit.c               |  6 +--
 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 56 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/include/hyp/switch.h    |  2 +
 arch/arm64/kvm/hyp/nvhe/switch.c           |  3 ++
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c   |  2 +
 arch/arm64/kvm/hyp/vgic-v3-sr.c            |  2 +
 arch/arm64/kvm/hyp/vhe/switch.c            |  3 ++
 arch/arm64/kvm/mmio.c                      |  2 +-
 arch/arm64/kvm/mmu.c                       |  2 +-
 arch/arm64/kvm/sys_regs.c                  |  2 +-
 12 files changed, 77 insertions(+), 31 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 0864f425547d..6d2b5d1aa7b3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -472,32 +472,9 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
-static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
+static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
 {
-	if (vcpu_mode_is_32bit(vcpu)) {
-		kvm_skip_instr32(vcpu);
-	} else {
-		*vcpu_pc(vcpu) += 4;
-		*vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
-	}
-
-	/* advance the singlestep state machine */
-	*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
-}
-
-/*
- * Skip an instruction which has been emulated at hyp while most guest sysregs
- * are live.
- */
-static __always_inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
-{
-	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
-	vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
-
-	kvm_skip_instr(vcpu);
-
-	write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
-	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
+	vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
 }
 
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 781d029b8aa8..7991a1c13254 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,6 +407,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
 #define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
 #define KVM_ARM64_GUEST_HAS_PTRAUTH	(1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_INCREMENT_PC		(1 << 8) /* Increment PC */
 
 #define vcpu_has_sve(vcpu) (system_supports_sve() && \
 			    ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 30bf8e22df54..d4e00a864ee6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -61,7 +61,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 	 * otherwise return to the same address...
 	 */
 	vcpu_set_reg(vcpu, 0, ~0UL);
-	kvm_skip_instr(vcpu);
+	kvm_incr_pc(vcpu);
 	return 1;
 }
 
@@ -100,7 +100,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
-	kvm_skip_instr(vcpu);
+	kvm_incr_pc(vcpu);
 
 	return 1;
 }
@@ -221,7 +221,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
 	 * that fail their condition code check"
 	 */
 	if (!kvm_condition_valid(vcpu)) {
-		kvm_skip_instr(vcpu);
+		kvm_incr_pc(vcpu);
 		handled = 1;
 	} else {
 		exit_handle_fn exit_handler;
diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
new file mode 100644
index 000000000000..d3043b07e78e
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Guest PC manipulation helpers
+ *
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Copyright (C) 2020 - Google LLC
+ * Author: Marc Zyngier <maz@kernel.org>
+ */
+
+#ifndef __ARM64_KVM_HYP_ADJUST_PC_H__
+#define __ARM64_KVM_HYP_ADJUST_PC_H__
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_host.h>
+
+static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
+{
+	if (vcpu_mode_is_32bit(vcpu)) {
+		kvm_skip_instr32(vcpu);
+	} else {
+		*vcpu_pc(vcpu) += 4;
+		*vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
+	}
+
+	/* advance the singlestep state machine */
+	*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
+}
+
+/*
+ * Skip an instruction which has been emulated at hyp while most guest sysregs
+ * are live.
+ */
+static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
+{
+	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
+	vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
+
+	kvm_skip_instr(vcpu);
+
+	write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
+	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
+}
+
+/*
+ * Adjust the guest PC on entry, depending on flags provided by EL1
+ * for the purpose of emulation (MMIO, sysreg).
+ */
+static inline void __adjust_pc(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
+		kvm_skip_instr(vcpu);
+		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
+	}
+}
+
+#endif
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 1f875a8f20c4..8b2328f62a07 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -7,6 +7,8 @@
 #ifndef __ARM64_KVM_HYP_SWITCH_H__
 #define __ARM64_KVM_HYP_SWITCH_H__
 
+#include <hyp/adjust_pc.h>
+
 #include <linux/arm-smccc.h>
 #include <linux/kvm_host.h>
 #include <linux/types.h>
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 8ae8160bc93a..3e50ff35aa4f 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -4,6 +4,7 @@
  * Author: Marc Zyngier <marc.zyngier@arm.com>
  */
 
+#include <hyp/adjust_pc.h>
 #include <hyp/switch.h>
 #include <hyp/sysreg-sr.h>
 
@@ -189,6 +190,8 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 
+	__adjust_pc(vcpu);
+
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
 	 * to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).
diff --git a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
index bd1bab551d48..8f0585640241 100644
--- a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
@@ -4,6 +4,8 @@
  * Author: Marc Zyngier <marc.zyngier@arm.com>
  */
 
+#include <hyp/adjust_pc.h>
+
 #include <linux/compiler.h>
 #include <linux/irqchip/arm-gic.h>
 #include <linux/kvm_host.h>
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 452f4cacd674..80406f463c28 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -4,6 +4,8 @@
  * Author: Marc Zyngier <marc.zyngier@arm.com>
  */
 
+#include <hyp/adjust_pc.h>
+
 #include <linux/compiler.h>
 #include <linux/irqchip/arm-gic-v3.h>
 #include <linux/kvm_host.h>
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 62546e20b251..af8e940d0f03 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -4,6 +4,7 @@
  * Author: Marc Zyngier <marc.zyngier@arm.com>
  */
 
+#include <hyp/adjust_pc.h>
 #include <hyp/switch.h>
 
 #include <linux/arm-smccc.h>
@@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__load_guest_stage2(vcpu->arch.hw_mmu);
 	__activate_traps(vcpu);
 
+	__adjust_pc(vcpu);
+
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 7e8eb32ae7d2..3e2d8ba11a02 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
 	 * The MMIO instruction is emulated and should not be re-executed
 	 * in the guest.
 	 */
-	kvm_skip_instr(vcpu);
+	kvm_incr_pc(vcpu);
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0bec07cf8d06..2ecd0c5622a6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1014,7 +1014,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * cautious, and skip the instruction.
 		 */
 		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
-			kvm_skip_instr(vcpu);
+			kvm_incr_pc(vcpu);
 			ret = 1;
 			goto out_unlock;
 		}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1232a814ca7f..6a6f06205ea7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2199,7 +2199,7 @@ static void perform_access(struct kvm_vcpu *vcpu,
 
 	/* Skip instruction if instructed so */
 	if (likely(r->access(vcpu, params, r)))
-		kvm_skip_instr(vcpu);
+		kvm_incr_pc(vcpu);
 }
 
 /*
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 04/11] KVM: arm64: Move PC rollback on SError to HYP
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (2 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 05/11] KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h Marc Zyngier
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

Instead of handling the "PC rollback on SError during HVC" at EL1 (which
requires disclosing PC to a potentially untrusted kernel), let's move
this fixup to ... fixup_guest_exit(), which is where we do all fixups.

Isn't that neat?

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c            | 17 -----------------
 arch/arm64/kvm/hyp/include/hyp/switch.h | 15 +++++++++++++++
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index d4e00a864ee6..f79137ee4274 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -241,23 +241,6 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
 {
 	struct kvm_run *run = vcpu->run;
 
-	if (ARM_SERROR_PENDING(exception_index)) {
-		u8 esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
-
-		/*
-		 * HVC already have an adjusted PC, which we need to
-		 * correct in order to return to after having injected
-		 * the SError.
-		 *
-		 * SMC, on the other hand, is *trapped*, meaning its
-		 * preferred return address is the SMC itself.
-		 */
-		if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
-			*vcpu_pc(vcpu) -= 4;
-
-		return 1;
-	}
-
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
 
 	switch (exception_index) {
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 8b2328f62a07..84473574c2e7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -411,6 +411,21 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
 
+	if (ARM_SERROR_PENDING(*exit_code)) {
+		u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
+
+		/*
+		 * HVC already have an adjusted PC, which we need to
+		 * correct in order to return to after having injected
+		 * the SError.
+		 *
+		 * SMC, on the other hand, is *trapped*, meaning its
+		 * preferred return address is the SMC itself.
+		 */
+		if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
+			write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
+	}
+
 	/*
 	 * We're using the raw exception code in order to only process
 	 * the trap if no SError is pending. We will come back to the
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 05/11] KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (3 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 04/11] KVM: arm64: Move PC rollback on SError " Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 06/11] KVM: arm64: Add basic hooks for injecting exceptions from EL2 Marc Zyngier
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

As we are about to need to access system registers from the HYP
code based on their internal encoding, move the direct sysreg
accessors to a common include file, with a VHE-specific guard.

No functionnal change.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 91 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c         | 81 ---------------------------
 2 files changed, 91 insertions(+), 81 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7991a1c13254..0672b3db6121 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -439,6 +439,97 @@ struct kvm_vcpu_arch {
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
 
+static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
+{
+	/*
+	 * *** VHE ONLY ***
+	 *
+	 * System registers listed in the switch are not saved on every
+	 * exit from the guest but are only saved on vcpu_put.
+	 *
+	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
+	 * should never be listed below, because the guest cannot modify its
+	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
+	 * thread when emulating cross-VCPU communication.
+	 */
+	if (!has_vhe())
+		return false;
+
+	switch (reg) {
+	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
+	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
+	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
+	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
+	case ELR_EL1:		*val = read_sysreg_s(SYS_ELR_EL12);	break;
+	case PAR_EL1:		*val = read_sysreg_par();		break;
+	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
+	default:		return false;
+	}
+
+	return true;
+}
+
+static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
+{
+	/*
+	 * *** VHE ONLY ***
+	 *
+	 * System registers listed in the switch are not restored on every
+	 * entry to the guest but are only restored on vcpu_load.
+	 *
+	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
+	 * should never be listed below, because the MPIDR should only be set
+	 * once, before running the VCPU, and never changed later.
+	 */
+	if (!has_vhe())
+		return false;
+
+	switch (reg) {
+	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
+	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
+	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
+	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
+	case ELR_EL1:		write_sysreg_s(val, SYS_ELR_EL12);	break;
+	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
+	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
+	default:		return false;
+	}
+
+	return true;
+}
+
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
  * same system registers.
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6a6f06205ea7..26c7c25f8a6d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -64,87 +64,6 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
-{
-	/*
-	 * System registers listed in the switch are not saved on every
-	 * exit from the guest but are only saved on vcpu_put.
-	 *
-	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
-	 * should never be listed below, because the guest cannot modify its
-	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
-	 * thread when emulating cross-VCPU communication.
-	 */
-	switch (reg) {
-	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
-	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
-	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
-	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
-	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
-	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
-	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
-	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
-	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
-	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
-	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
-	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
-	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
-	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
-	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
-	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
-	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
-	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
-	case ELR_EL1:		*val = read_sysreg_s(SYS_ELR_EL12);	break;
-	case PAR_EL1:		*val = read_sysreg_par();		break;
-	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
-	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
-	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
-	default:		return false;
-	}
-
-	return true;
-}
-
-static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
-{
-	/*
-	 * System registers listed in the switch are not restored on every
-	 * entry to the guest but are only restored on vcpu_load.
-	 *
-	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
-	 * should never be listed below, because the MPIDR should only be set
-	 * once, before running the VCPU, and never changed later.
-	 */
-	switch (reg) {
-	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
-	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
-	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
-	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
-	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
-	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
-	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
-	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
-	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
-	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
-	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
-	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
-	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
-	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
-	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
-	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
-	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
-	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
-	case ELR_EL1:		write_sysreg_s(val, SYS_ELR_EL12);	break;
-	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
-	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
-	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
-	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
-	default:		return false;
-	}
-
-	return true;
-}
-
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 06/11] KVM: arm64: Add basic hooks for injecting exceptions from EL2
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (4 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 05/11] KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 07/11] KVM: arm64: Inject AArch64 exceptions from HYP Marc Zyngier
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

Add the basic infrastructure to describe injection of exceptions
into a guest. So far, nothing uses this code path.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h          | 28 ++++++++++++++++++++--
 arch/arm64/kvm/hyp/exception.c             | 17 +++++++++++++
 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 10 ++++++--
 arch/arm64/kvm/hyp/nvhe/Makefile           |  2 +-
 arch/arm64/kvm/hyp/vhe/Makefile            |  2 +-
 5 files changed, 53 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/exception.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0672b3db6121..7a1faf917f3c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,9 +407,33 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
 #define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
 #define KVM_ARM64_GUEST_HAS_PTRAUTH	(1 << 7) /* PTRAUTH exposed to guest */
-#define KVM_ARM64_INCREMENT_PC		(1 << 8) /* Increment PC */
+#define KVM_ARM64_PENDING_EXCEPTION	(1 << 8) /* Exception pending */
+#define KVM_ARM64_EXCEPT_MASK		(7 << 9) /* Target EL/MODE */
 
-#define vcpu_has_sve(vcpu) (system_supports_sve() && \
+/*
+ * When KVM_ARM64_PENDING_EXCEPTION is set, KVM_ARM64_EXCEPT_MASK can
+ * take the following values:
+ *
+ * For AArch32 EL1:
+ */
+#define KVM_ARM64_EXCEPT_AA32_UND	(0 << 9)
+#define KVM_ARM64_EXCEPT_AA32_IABT	(1 << 9)
+#define KVM_ARM64_EXCEPT_AA32_DABT	(2 << 9)
+/* For AArch64: */
+#define KVM_ARM64_EXCEPT_AA64_ELx_SYNC	(0 << 9)
+#define KVM_ARM64_EXCEPT_AA64_ELx_IRQ	(1 << 9)
+#define KVM_ARM64_EXCEPT_AA64_ELx_FIQ	(2 << 9)
+#define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
+#define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
+#define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
+
+/*
+ * Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
+ * set together with an exception...
+ */
+#define KVM_ARM64_INCREMENT_PC		(1 << 9) /* Increment PC */
+
+#define vcpu_has_sve(vcpu) (system_supports_sve() &&			\
 			    ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
 
 #ifdef CONFIG_ARM64_PTR_AUTH
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
new file mode 100644
index 000000000000..6533a9270850
--- /dev/null
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Fault injection for both 32 and 64bit guests.
+ *
+ * Copyright (C) 2012,2013 - ARM Ltd
+ * Author: Marc Zyngier <marc.zyngier@arm.com>
+ *
+ * Based on arch/arm/kvm/emulate.c
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall <c.dall@virtualopensystems.com>
+ */
+
+#include <hyp/adjust_pc.h>
+
+void kvm_inject_exception(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
index d3043b07e78e..b1f60923a8fe 100644
--- a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
+++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
@@ -13,6 +13,8 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 
+void kvm_inject_exception(struct kvm_vcpu *vcpu);
+
 static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu)) {
@@ -43,11 +45,15 @@ static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
 
 /*
  * Adjust the guest PC on entry, depending on flags provided by EL1
- * for the purpose of emulation (MMIO, sysreg).
+ * for the purpose of emulation (MMIO, sysreg) or exception injection.
  */
 static inline void __adjust_pc(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
+	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
+		kvm_inject_exception(vcpu);
+		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+				      KVM_ARM64_EXCEPT_MASK);
+	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
 		kvm_skip_instr(vcpu);
 		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index ddde15fe85f2..77b8c4e06f2f 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -8,7 +8,7 @@ ccflags-y := -D__KVM_NVHE_HYPERVISOR__
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o hyp-main.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o
+	 ../fpsimd.o ../hyp-entry.o ../exception.o
 
 ##
 ## Build rules for compiling nVHE hyp code
diff --git a/arch/arm64/kvm/hyp/vhe/Makefile b/arch/arm64/kvm/hyp/vhe/Makefile
index 461e97c375cc..96bec0ecf9dd 100644
--- a/arch/arm64/kvm/hyp/vhe/Makefile
+++ b/arch/arm64/kvm/hyp/vhe/Makefile
@@ -8,4 +8,4 @@ ccflags-y := -D__KVM_VHE_HYPERVISOR__
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
-	 ../fpsimd.o ../hyp-entry.o
+	 ../fpsimd.o ../hyp-entry.o ../exception.o
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 07/11] KVM: arm64: Inject AArch64 exceptions from HYP
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (5 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 06/11] KVM: arm64: Add basic hooks for injecting exceptions from EL2 Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 08/11] KVM: arm64: Inject AArch32 " Marc Zyngier
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

Move the AArch64 exception injection code from EL1 to HYP, leaving
only the ESR_EL1 updates to EL1. In order to come with the differences
between VHE and nVHE, two set of system register accessors are provided.

SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  12 +++
 arch/arm64/kvm/hyp/exception.c       | 136 +++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        | 114 ++--------------------
 3 files changed, 154 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 6d2b5d1aa7b3..736a342dadf7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -21,6 +21,18 @@
 #include <asm/cputype.h>
 #include <asm/virt.h>
 
+#define CURRENT_EL_SP_EL0_VECTOR	0x0
+#define CURRENT_EL_SP_ELx_VECTOR	0x200
+#define LOWER_EL_AArch64_VECTOR		0x400
+#define LOWER_EL_AArch32_VECTOR		0x600
+
+enum exception_type {
+	except_type_sync	= 0,
+	except_type_irq		= 0x80,
+	except_type_fiq		= 0x100,
+	except_type_serror	= 0x180,
+};
+
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
 void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 6533a9270850..94d6d823424d 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -11,7 +11,143 @@
  */
 
 #include <hyp/adjust_pc.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
+#if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
+#error Hypervisor code only!
+#endif
+
+static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
+{
+	u64 val;
+
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+		return val;
+
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
+static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
+{
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
+		return;
+
+	 __vcpu_sys_reg(vcpu, reg) = val;
+}
+
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+{
+	write_sysreg_el1(val, SYS_SPSR);
+}
+
+/*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
+ * The EL passed to this function *must* be a non-secure, privileged mode with
+ * bit 0 being set (PSTATE.SP == 1).
+ *
+ * When an exception is taken, most PSTATE fields are left unchanged in the
+ * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
+ * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
+ * layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
+ *
+ * For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
+ * For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
+ *
+ * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
+ * MSB to LSB.
+ */
+static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      enum exception_type type)
+{
+	unsigned long sctlr, vbar, old, new, mode;
+	u64 exc_offset;
+
+	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	if      (mode == target_mode)
+		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
+		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+	else if (!(mode & PSR_MODE32_BIT))
+		exc_offset = LOWER_EL_AArch64_VECTOR;
+	else
+		exc_offset = LOWER_EL_AArch32_VECTOR;
+
+	switch (target_mode) {
+	case PSR_MODE_EL1h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+		break;
+	default:
+		/* Don't do that */
+		BUG();
+	}
+
+	*vcpu_pc(vcpu) = vbar + exc_offset + type;
+
+	old = *vcpu_cpsr(vcpu);
+	new = 0;
+
+	new |= (old & PSR_N_BIT);
+	new |= (old & PSR_Z_BIT);
+	new |= (old & PSR_C_BIT);
+	new |= (old & PSR_V_BIT);
+
+	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
+
+	new |= (old & PSR_DIT_BIT);
+
+	// PSTATE.UAO is set to zero upon any exception to AArch64
+	// See ARM DDI 0487E.a, page D5-2579.
+
+	// PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
+	// SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
+	// See ARM DDI 0487E.a, page D5-2578.
+	new |= (old & PSR_PAN_BIT);
+	if (!(sctlr & SCTLR_EL1_SPAN))
+		new |= PSR_PAN_BIT;
+
+	// PSTATE.SS is set to zero upon any exception to AArch64
+	// See ARM DDI 0487E.a, page D2-2452.
+
+	// PSTATE.IL is set to zero upon any exception to AArch64
+	// See ARM DDI 0487E.a, page D1-2306.
+
+	// PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
+	// See ARM DDI 0487E.a, page D13-3258
+	if (sctlr & SCTLR_ELx_DSSBS)
+		new |= PSR_SSBS_BIT;
+
+	// PSTATE.BTYPE is set to zero upon any exception to AArch64
+	// See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
+
+	new |= PSR_D_BIT;
+	new |= PSR_A_BIT;
+	new |= PSR_I_BIT;
+	new |= PSR_F_BIT;
+
+	new |= target_mode;
+
+	*vcpu_cpsr(vcpu) = new;
+	__vcpu_write_spsr(vcpu, old);
+}
 
 void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
+	switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
+	case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+	      KVM_ARM64_EXCEPT_AA64_EL1):
+		enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+		break;
+	default:
+		/*
+		 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
+		 * will be implemented at some point. Everything
+		 * else gets silently ignored.
+		 */
+		break;
+	}
 }
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 34a96ab244fa..8862431f8e3b 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -14,119 +14,15 @@
 #include <asm/kvm_emulate.h>
 #include <asm/esr.h>
 
-#define CURRENT_EL_SP_EL0_VECTOR	0x0
-#define CURRENT_EL_SP_ELx_VECTOR	0x200
-#define LOWER_EL_AArch64_VECTOR		0x400
-#define LOWER_EL_AArch32_VECTOR		0x600
-
-enum exception_type {
-	except_type_sync	= 0,
-	except_type_irq		= 0x80,
-	except_type_fiq		= 0x100,
-	except_type_serror	= 0x180,
-};
-
-/*
- * This performs the exception entry at a given EL (@target_mode), stashing PC
- * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
- * The EL passed to this function *must* be a non-secure, privileged mode with
- * bit 0 being set (PSTATE.SP == 1).
- *
- * When an exception is taken, most PSTATE fields are left unchanged in the
- * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
- * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
- * layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
- *
- * For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
- * For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
- *
- * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
- * MSB to LSB.
- */
-static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
-			      enum exception_type type)
-{
-	unsigned long sctlr, vbar, old, new, mode;
-	u64 exc_offset;
-
-	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	if      (mode == target_mode)
-		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
-	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
-		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
-	else if (!(mode & PSR_MODE32_BIT))
-		exc_offset = LOWER_EL_AArch64_VECTOR;
-	else
-		exc_offset = LOWER_EL_AArch32_VECTOR;
-
-	switch (target_mode) {
-	case PSR_MODE_EL1h:
-		vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
-		sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
-		vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-		break;
-	default:
-		/* Don't do that */
-		BUG();
-	}
-
-	*vcpu_pc(vcpu) = vbar + exc_offset + type;
-
-	old = *vcpu_cpsr(vcpu);
-	new = 0;
-
-	new |= (old & PSR_N_BIT);
-	new |= (old & PSR_Z_BIT);
-	new |= (old & PSR_C_BIT);
-	new |= (old & PSR_V_BIT);
-
-	// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
-
-	new |= (old & PSR_DIT_BIT);
-
-	// PSTATE.UAO is set to zero upon any exception to AArch64
-	// See ARM DDI 0487E.a, page D5-2579.
-
-	// PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
-	// SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
-	// See ARM DDI 0487E.a, page D5-2578.
-	new |= (old & PSR_PAN_BIT);
-	if (!(sctlr & SCTLR_EL1_SPAN))
-		new |= PSR_PAN_BIT;
-
-	// PSTATE.SS is set to zero upon any exception to AArch64
-	// See ARM DDI 0487E.a, page D2-2452.
-
-	// PSTATE.IL is set to zero upon any exception to AArch64
-	// See ARM DDI 0487E.a, page D1-2306.
-
-	// PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
-	// See ARM DDI 0487E.a, page D13-3258
-	if (sctlr & SCTLR_ELx_DSSBS)
-		new |= PSR_SSBS_BIT;
-
-	// PSTATE.BTYPE is set to zero upon any exception to AArch64
-	// See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
-
-	new |= PSR_D_BIT;
-	new |= PSR_A_BIT;
-	new |= PSR_I_BIT;
-	new |= PSR_F_BIT;
-
-	new |= target_mode;
-
-	*vcpu_cpsr(vcpu) = new;
-	vcpu_write_spsr(vcpu, old);
-}
-
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
 {
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
+			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
+			     KVM_ARM64_PENDING_EXCEPTION);
 
 	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
 
@@ -156,7 +52,9 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 {
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
+			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
+			     KVM_ARM64_PENDING_EXCEPTION);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 08/11] KVM: arm64: Inject AArch32 exceptions from HYP
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (6 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 07/11] KVM: arm64: Inject AArch64 exceptions from HYP Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 09/11] KVM: arm64: Remove SPSR manipulation primitives Marc Zyngier
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

Similarly to what has been done for AArch64, move the AArch32 exception
injection to HYP.

In order to not use the regmap selection code at EL2, simplify the code
populating the target mode's LR register by useing the compatibility
aliases for LR_abt and LR_und.

We also introduce new accessors for SPSR_abt and SPSR_und, and
move VBAR/SCTLR to using the AArch64 accessors (the use of the AArch32
names was an ARMv7 leftover).

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/aarch32.c       | 149 +-----------------------
 arch/arm64/kvm/hyp/exception.c | 200 +++++++++++++++++++++++++++++++--
 2 files changed, 195 insertions(+), 154 deletions(-)

diff --git a/arch/arm64/kvm/aarch32.c b/arch/arm64/kvm/aarch32.c
index 40a62a99fbf8..ad453b47c517 100644
--- a/arch/arm64/kvm/aarch32.c
+++ b/arch/arm64/kvm/aarch32.c
@@ -19,20 +19,6 @@
 #define DFSR_FSC_EXTABT_nLPAE	0x08
 #define DFSR_LPAE		BIT(9)
 
-/*
- * Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
- */
-static const u8 return_offsets[8][2] = {
-	[0] = { 0, 0 },		/* Reset, unused */
-	[1] = { 4, 2 },		/* Undefined */
-	[2] = { 0, 0 },		/* SVC, unused */
-	[3] = { 4, 4 },		/* Prefetch abort */
-	[4] = { 8, 8 },		/* Data abort */
-	[5] = { 0, 0 },		/* HVC, unused */
-	[6] = { 4, 4 },		/* IRQ, unused */
-	[7] = { 4, 4 },		/* FIQ, unused */
-};
-
 static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
 {
 	preempt_disable();
@@ -53,132 +39,10 @@ static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded)
 	}
 }
 
-/*
- * When an exception is taken, most CPSR fields are left unchanged in the
- * handler. However, some are explicitly overridden (e.g. M[4:0]).
- *
- * The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
- * either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
- * obsoleted by the ARMv7 virtualization extensions and is RES0.
- *
- * For the SPSR layout seen from AArch32, see:
- * - ARM DDI 0406C.d, page B1-1148
- * - ARM DDI 0487E.a, page G8-6264
- *
- * For the SPSR_ELx layout for AArch32 seen from AArch64, see:
- * - ARM DDI 0487E.a, page C5-426
- *
- * Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
- * MSB to LSB.
- */
-static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
-{
-	u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
-	unsigned long old, new;
-
-	old = *vcpu_cpsr(vcpu);
-	new = 0;
-
-	new |= (old & PSR_AA32_N_BIT);
-	new |= (old & PSR_AA32_Z_BIT);
-	new |= (old & PSR_AA32_C_BIT);
-	new |= (old & PSR_AA32_V_BIT);
-	new |= (old & PSR_AA32_Q_BIT);
-
-	// CPSR.IT[7:0] are set to zero upon any exception
-	// See ARM DDI 0487E.a, section G1.12.3
-	// See ARM DDI 0406C.d, section B1.8.3
-
-	new |= (old & PSR_AA32_DIT_BIT);
-
-	// CPSR.SSBS is set to SCTLR.DSSBS upon any exception
-	// See ARM DDI 0487E.a, page G8-6244
-	if (sctlr & BIT(31))
-		new |= PSR_AA32_SSBS_BIT;
-
-	// CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
-	// SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
-	// See ARM DDI 0487E.a, page G8-6246
-	new |= (old & PSR_AA32_PAN_BIT);
-	if (!(sctlr & BIT(23)))
-		new |= PSR_AA32_PAN_BIT;
-
-	// SS does not exist in AArch32, so ignore
-
-	// CPSR.IL is set to zero upon any exception
-	// See ARM DDI 0487E.a, page G1-5527
-
-	new |= (old & PSR_AA32_GE_MASK);
-
-	// CPSR.IT[7:0] are set to zero upon any exception
-	// See prior comment above
-
-	// CPSR.E is set to SCTLR.EE upon any exception
-	// See ARM DDI 0487E.a, page G8-6245
-	// See ARM DDI 0406C.d, page B4-1701
-	if (sctlr & BIT(25))
-		new |= PSR_AA32_E_BIT;
-
-	// CPSR.A is unchanged upon an exception to Undefined, Supervisor
-	// CPSR.A is set upon an exception to other modes
-	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
-	// See ARM DDI 0406C.d, page B1-1182
-	new |= (old & PSR_AA32_A_BIT);
-	if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
-		new |= PSR_AA32_A_BIT;
-
-	// CPSR.I is set upon any exception
-	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
-	// See ARM DDI 0406C.d, page B1-1182
-	new |= PSR_AA32_I_BIT;
-
-	// CPSR.F is set upon an exception to FIQ
-	// CPSR.F is unchanged upon an exception to other modes
-	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
-	// See ARM DDI 0406C.d, page B1-1182
-	new |= (old & PSR_AA32_F_BIT);
-	if (mode == PSR_AA32_MODE_FIQ)
-		new |= PSR_AA32_F_BIT;
-
-	// CPSR.T is set to SCTLR.TE upon any exception
-	// See ARM DDI 0487E.a, page G8-5514
-	// See ARM DDI 0406C.d, page B1-1181
-	if (sctlr & BIT(30))
-		new |= PSR_AA32_T_BIT;
-
-	new |= mode;
-
-	return new;
-}
-
-static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
-{
-	unsigned long spsr = *vcpu_cpsr(vcpu);
-	bool is_thumb = (spsr & PSR_AA32_T_BIT);
-	u32 return_offset = return_offsets[vect_offset >> 2][is_thumb];
-	u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
-
-	*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
-
-	/* Note: These now point to the banked copies */
-	vcpu_write_spsr(vcpu, host_spsr_to_spsr32(spsr));
-	*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
-
-	/* Branch to exception vector */
-	if (sctlr & (1 << 13))
-		vect_offset += 0xffff0000;
-	else /* always have security exceptions */
-		vect_offset += vcpu_cp15(vcpu, c12_VBAR);
-
-	*vcpu_pc(vcpu) = vect_offset;
-}
-
 void kvm_inject_undef32(struct kvm_vcpu *vcpu)
 {
-	bool loaded = pre_fault_synchronize(vcpu);
-
-	prepare_fault32(vcpu, PSR_AA32_MODE_UND, 4);
-	post_fault_synchronize(vcpu, loaded);
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_UND |
+			     KVM_ARM64_PENDING_EXCEPTION);
 }
 
 /*
@@ -188,7 +52,6 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu)
 static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
 			 unsigned long addr)
 {
-	u32 vect_offset;
 	u32 *far, *fsr;
 	bool is_lpae;
 	bool loaded;
@@ -196,17 +59,17 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
 	loaded = pre_fault_synchronize(vcpu);
 
 	if (is_pabt) {
-		vect_offset = 12;
+		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_IABT |
+				     KVM_ARM64_PENDING_EXCEPTION);
 		far = &vcpu_cp15(vcpu, c6_IFAR);
 		fsr = &vcpu_cp15(vcpu, c5_IFSR);
 	} else { /* !iabt */
-		vect_offset = 16;
+		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_DABT |
+				     KVM_ARM64_PENDING_EXCEPTION);
 		far = &vcpu_cp15(vcpu, c6_DFAR);
 		fsr = &vcpu_cp15(vcpu, c5_DFSR);
 	}
 
-	prepare_fault32(vcpu, PSR_AA32_MODE_ABT, vect_offset);
-
 	*far = addr;
 
 	/* Give the guest an IMPLEMENTATION DEFINED exception */
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 94d6d823424d..73629094f903 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -41,6 +41,22 @@ static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
 	write_sysreg_el1(val, SYS_SPSR);
 }
 
+static void __vcpu_write_spsr_abt(struct kvm_vcpu *vcpu, u64 val)
+{
+	if (has_vhe())
+		write_sysreg(val, spsr_abt);
+	else
+		vcpu->arch.ctxt.spsr_abt = val;
+}
+
+static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
+{
+	if (has_vhe())
+		write_sysreg(val, spsr_und);
+	else
+		vcpu->arch.ctxt.spsr_und = val;
+}
+
 /*
  * This performs the exception entry at a given EL (@target_mode), stashing PC
  * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
@@ -135,19 +151,181 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	__vcpu_write_spsr(vcpu, old);
 }
 
-void kvm_inject_exception(struct kvm_vcpu *vcpu)
+/*
+ * When an exception is taken, most CPSR fields are left unchanged in the
+ * handler. However, some are explicitly overridden (e.g. M[4:0]).
+ *
+ * The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
+ * either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
+ * obsoleted by the ARMv7 virtualization extensions and is RES0.
+ *
+ * For the SPSR layout seen from AArch32, see:
+ * - ARM DDI 0406C.d, page B1-1148
+ * - ARM DDI 0487E.a, page G8-6264
+ *
+ * For the SPSR_ELx layout for AArch32 seen from AArch64, see:
+ * - ARM DDI 0487E.a, page C5-426
+ *
+ * Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
+ * MSB to LSB.
+ */
+static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
+{
+	u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+	unsigned long old, new;
+
+	old = *vcpu_cpsr(vcpu);
+	new = 0;
+
+	new |= (old & PSR_AA32_N_BIT);
+	new |= (old & PSR_AA32_Z_BIT);
+	new |= (old & PSR_AA32_C_BIT);
+	new |= (old & PSR_AA32_V_BIT);
+	new |= (old & PSR_AA32_Q_BIT);
+
+	// CPSR.IT[7:0] are set to zero upon any exception
+	// See ARM DDI 0487E.a, section G1.12.3
+	// See ARM DDI 0406C.d, section B1.8.3
+
+	new |= (old & PSR_AA32_DIT_BIT);
+
+	// CPSR.SSBS is set to SCTLR.DSSBS upon any exception
+	// See ARM DDI 0487E.a, page G8-6244
+	if (sctlr & BIT(31))
+		new |= PSR_AA32_SSBS_BIT;
+
+	// CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
+	// SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
+	// See ARM DDI 0487E.a, page G8-6246
+	new |= (old & PSR_AA32_PAN_BIT);
+	if (!(sctlr & BIT(23)))
+		new |= PSR_AA32_PAN_BIT;
+
+	// SS does not exist in AArch32, so ignore
+
+	// CPSR.IL is set to zero upon any exception
+	// See ARM DDI 0487E.a, page G1-5527
+
+	new |= (old & PSR_AA32_GE_MASK);
+
+	// CPSR.IT[7:0] are set to zero upon any exception
+	// See prior comment above
+
+	// CPSR.E is set to SCTLR.EE upon any exception
+	// See ARM DDI 0487E.a, page G8-6245
+	// See ARM DDI 0406C.d, page B4-1701
+	if (sctlr & BIT(25))
+		new |= PSR_AA32_E_BIT;
+
+	// CPSR.A is unchanged upon an exception to Undefined, Supervisor
+	// CPSR.A is set upon an exception to other modes
+	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+	// See ARM DDI 0406C.d, page B1-1182
+	new |= (old & PSR_AA32_A_BIT);
+	if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
+		new |= PSR_AA32_A_BIT;
+
+	// CPSR.I is set upon any exception
+	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+	// See ARM DDI 0406C.d, page B1-1182
+	new |= PSR_AA32_I_BIT;
+
+	// CPSR.F is set upon an exception to FIQ
+	// CPSR.F is unchanged upon an exception to other modes
+	// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
+	// See ARM DDI 0406C.d, page B1-1182
+	new |= (old & PSR_AA32_F_BIT);
+	if (mode == PSR_AA32_MODE_FIQ)
+		new |= PSR_AA32_F_BIT;
+
+	// CPSR.T is set to SCTLR.TE upon any exception
+	// See ARM DDI 0487E.a, page G8-5514
+	// See ARM DDI 0406C.d, page B1-1181
+	if (sctlr & BIT(30))
+		new |= PSR_AA32_T_BIT;
+
+	new |= mode;
+
+	return new;
+}
+
+/*
+ * Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
+ */
+static const u8 return_offsets[8][2] = {
+	[0] = { 0, 0 },		/* Reset, unused */
+	[1] = { 4, 2 },		/* Undefined */
+	[2] = { 0, 0 },		/* SVC, unused */
+	[3] = { 4, 4 },		/* Prefetch abort */
+	[4] = { 8, 8 },		/* Data abort */
+	[5] = { 0, 0 },		/* HVC, unused */
+	[6] = { 4, 4 },		/* IRQ, unused */
+	[7] = { 4, 4 },		/* FIQ, unused */
+};
+
+static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 {
-	switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
-	case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
-	      KVM_ARM64_EXCEPT_AA64_EL1):
-		enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+	unsigned long spsr = *vcpu_cpsr(vcpu);
+	bool is_thumb = (spsr & PSR_AA32_T_BIT);
+	u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+	u32 return_address;
+
+	*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
+	return_address   = *vcpu_pc(vcpu);
+	return_address  += return_offsets[vect_offset >> 2][is_thumb];
+
+	/* KVM only enters the ABT and UND modes, so only deal with those */
+	switch(mode) {
+	case PSR_AA32_MODE_ABT:
+		__vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
+		vcpu_gp_regs(vcpu)->compat_lr_abt = return_address;
 		break;
-	default:
-		/*
-		 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
-		 * will be implemented at some point. Everything
-		 * else gets silently ignored.
-		 */
+
+	case PSR_AA32_MODE_UND:
+		__vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
+		vcpu_gp_regs(vcpu)->compat_lr_und = return_address;
 		break;
 	}
+
+	/* Branch to exception vector */
+	if (sctlr & (1 << 13))
+		vect_offset += 0xffff0000;
+	else /* always have security exceptions */
+		vect_offset += __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+
+	*vcpu_pc(vcpu) = vect_offset;
+}
+
+void kvm_inject_exception(struct kvm_vcpu *vcpu)
+{
+	if (vcpu_el1_is_32bit(vcpu)) {
+		switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
+		case KVM_ARM64_EXCEPT_AA32_UND:
+			enter_exception32(vcpu, PSR_AA32_MODE_UND, 4);
+			break;
+		case KVM_ARM64_EXCEPT_AA32_IABT:
+			enter_exception32(vcpu, PSR_AA32_MODE_ABT, 12);
+			break;
+		case KVM_ARM64_EXCEPT_AA32_DABT:
+			enter_exception32(vcpu, PSR_AA32_MODE_ABT, 16);
+			break;
+		default:
+			/* Err... */
+			break;
+		}
+	} else {
+		switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
+		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+		      KVM_ARM64_EXCEPT_AA64_EL1):
+			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
+			break;
+		default:
+			/*
+			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
+			 * will be implemented at some point. Everything
+			 * else gets silently ignored.
+			 */
+			break;
+		}
+	}
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 09/11] KVM: arm64: Remove SPSR manipulation primitives
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (7 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 08/11] KVM: arm64: Inject AArch32 " Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 10/11] KVM: arm64: Consolidate exception injection Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 11/11] KVM: arm64: Get rid of the AArch32 register mapping code Marc Zyngier
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

The SPSR setting code is now completely unused, including that dealing
with banked AArch32 SPSRs. Cleanup time.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 26 --------
 arch/arm64/kvm/regmap.c              | 96 ----------------------------
 2 files changed, 122 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 736a342dadf7..5d957d0e7b69 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -34,8 +34,6 @@ enum exception_type {
 };
 
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
-unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
-void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
 
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
@@ -180,30 +178,6 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
 }
 
-static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
-{
-	if (vcpu_mode_is_32bit(vcpu))
-		return vcpu_read_spsr32(vcpu);
-
-	if (vcpu->arch.sysregs_loaded_on_cpu)
-		return read_sysreg_el1(SYS_SPSR);
-	else
-		return __vcpu_sys_reg(vcpu, SPSR_EL1);
-}
-
-static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
-{
-	if (vcpu_mode_is_32bit(vcpu)) {
-		vcpu_write_spsr32(vcpu, v);
-		return;
-	}
-
-	if (vcpu->arch.sysregs_loaded_on_cpu)
-		write_sysreg_el1(v, SYS_SPSR);
-	else
-		__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
-}
-
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index accc1d5fba61..ae7e290bb017 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -126,99 +126,3 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
 
 	return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
-
-/*
- * Return the SPSR for the current mode of the virtual CPU.
- */
-static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
-{
-	unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
-	switch (mode) {
-	case PSR_AA32_MODE_SVC: return KVM_SPSR_SVC;
-	case PSR_AA32_MODE_ABT: return KVM_SPSR_ABT;
-	case PSR_AA32_MODE_UND: return KVM_SPSR_UND;
-	case PSR_AA32_MODE_IRQ: return KVM_SPSR_IRQ;
-	case PSR_AA32_MODE_FIQ: return KVM_SPSR_FIQ;
-	default: BUG();
-	}
-}
-
-unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
-{
-	int spsr_idx = vcpu_spsr32_mode(vcpu);
-
-	if (!vcpu->arch.sysregs_loaded_on_cpu) {
-		switch (spsr_idx) {
-		case KVM_SPSR_SVC:
-			return __vcpu_sys_reg(vcpu, SPSR_EL1);
-		case KVM_SPSR_ABT:
-			return vcpu->arch.ctxt.spsr_abt;
-		case KVM_SPSR_UND:
-			return vcpu->arch.ctxt.spsr_und;
-		case KVM_SPSR_IRQ:
-			return vcpu->arch.ctxt.spsr_irq;
-		case KVM_SPSR_FIQ:
-			return vcpu->arch.ctxt.spsr_fiq;
-		}
-	}
-
-	switch (spsr_idx) {
-	case KVM_SPSR_SVC:
-		return read_sysreg_el1(SYS_SPSR);
-	case KVM_SPSR_ABT:
-		return read_sysreg(spsr_abt);
-	case KVM_SPSR_UND:
-		return read_sysreg(spsr_und);
-	case KVM_SPSR_IRQ:
-		return read_sysreg(spsr_irq);
-	case KVM_SPSR_FIQ:
-		return read_sysreg(spsr_fiq);
-	default:
-		BUG();
-	}
-}
-
-void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
-{
-	int spsr_idx = vcpu_spsr32_mode(vcpu);
-
-	if (!vcpu->arch.sysregs_loaded_on_cpu) {
-		switch (spsr_idx) {
-		case KVM_SPSR_SVC:
-			__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
-			break;
-		case KVM_SPSR_ABT:
-			vcpu->arch.ctxt.spsr_abt = v;
-			break;
-		case KVM_SPSR_UND:
-			vcpu->arch.ctxt.spsr_und = v;
-			break;
-		case KVM_SPSR_IRQ:
-			vcpu->arch.ctxt.spsr_irq = v;
-			break;
-		case KVM_SPSR_FIQ:
-			vcpu->arch.ctxt.spsr_fiq = v;
-			break;
-		}
-
-		return;
-	}
-
-	switch (spsr_idx) {
-	case KVM_SPSR_SVC:
-		write_sysreg_el1(v, SYS_SPSR);
-		break;
-	case KVM_SPSR_ABT:
-		write_sysreg(v, spsr_abt);
-		break;
-	case KVM_SPSR_UND:
-		write_sysreg(v, spsr_und);
-		break;
-	case KVM_SPSR_IRQ:
-		write_sysreg(v, spsr_irq);
-		break;
-	case KVM_SPSR_FIQ:
-		write_sysreg(v, spsr_fiq);
-		break;
-	}
-}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 10/11] KVM: arm64: Consolidate exception injection
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (8 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 09/11] KVM: arm64: Remove SPSR manipulation primitives Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  2020-11-02 16:40 ` [PATCH v2 11/11] KVM: arm64: Get rid of the AArch32 register mapping code Marc Zyngier
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

Move the AArch32 exception injection code back into the inject_fault.c
file, removing the need for a few non-static functions now that AArch32
host support is a thing of the past.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 -
 arch/arm64/kvm/Makefile              |  2 +-
 arch/arm64/kvm/aarch32.c             | 95 ----------------------------
 arch/arm64/kvm/inject_fault.c        | 75 +++++++++++++++++++++-
 4 files changed, 73 insertions(+), 102 deletions(-)
 delete mode 100644 arch/arm64/kvm/aarch32.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5d957d0e7b69..3105bb73f539 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -42,9 +42,6 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_vabt(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
-void kvm_inject_undef32(struct kvm_vcpu *vcpu);
-void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
-void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1504c81fbf5d..9b32a89a25c8 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 inject_fault.o regmap.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
-	 aarch32.o arch_timer.o \
+	 arch_timer.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/aarch32.c b/arch/arm64/kvm/aarch32.c
deleted file mode 100644
index ad453b47c517..000000000000
--- a/arch/arm64/kvm/aarch32.c
+++ /dev/null
@@ -1,95 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * (not much of an) Emulation layer for 32bit guests.
- *
- * Copyright (C) 2012,2013 - ARM Ltd
- * Author: Marc Zyngier <marc.zyngier@arm.com>
- *
- * based on arch/arm/kvm/emulate.c
- * Copyright (C) 2012 - Virtual Open Systems and Columbia University
- * Author: Christoffer Dall <c.dall@virtualopensystems.com>
- */
-
-#include <linux/bits.h>
-#include <linux/kvm_host.h>
-#include <asm/kvm_emulate.h>
-#include <asm/kvm_hyp.h>
-
-#define DFSR_FSC_EXTABT_LPAE	0x10
-#define DFSR_FSC_EXTABT_nLPAE	0x08
-#define DFSR_LPAE		BIT(9)
-
-static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
-{
-	preempt_disable();
-	if (vcpu->arch.sysregs_loaded_on_cpu) {
-		kvm_arch_vcpu_put(vcpu);
-		return true;
-	}
-
-	preempt_enable();
-	return false;
-}
-
-static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded)
-{
-	if (loaded) {
-		kvm_arch_vcpu_load(vcpu, smp_processor_id());
-		preempt_enable();
-	}
-}
-
-void kvm_inject_undef32(struct kvm_vcpu *vcpu)
-{
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_UND |
-			     KVM_ARM64_PENDING_EXCEPTION);
-}
-
-/*
- * Modelled after TakeDataAbortException() and TakePrefetchAbortException
- * pseudocode.
- */
-static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
-			 unsigned long addr)
-{
-	u32 *far, *fsr;
-	bool is_lpae;
-	bool loaded;
-
-	loaded = pre_fault_synchronize(vcpu);
-
-	if (is_pabt) {
-		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_IABT |
-				     KVM_ARM64_PENDING_EXCEPTION);
-		far = &vcpu_cp15(vcpu, c6_IFAR);
-		fsr = &vcpu_cp15(vcpu, c5_IFSR);
-	} else { /* !iabt */
-		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_DABT |
-				     KVM_ARM64_PENDING_EXCEPTION);
-		far = &vcpu_cp15(vcpu, c6_DFAR);
-		fsr = &vcpu_cp15(vcpu, c5_DFSR);
-	}
-
-	*far = addr;
-
-	/* Give the guest an IMPLEMENTATION DEFINED exception */
-	is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
-	if (is_lpae) {
-		*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
-	} else {
-		/* no need to shuffle FS[4] into DFSR[10] as its 0 */
-		*fsr = DFSR_FSC_EXTABT_nLPAE;
-	}
-
-	post_fault_synchronize(vcpu, loaded);
-}
-
-void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr)
-{
-	inject_abt32(vcpu, false, addr);
-}
-
-void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr)
-{
-	inject_abt32(vcpu, true, addr);
-}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 8862431f8e3b..e2a2e48ca371 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -66,6 +66,75 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
 }
 
+#define DFSR_FSC_EXTABT_LPAE	0x10
+#define DFSR_FSC_EXTABT_nLPAE	0x08
+#define DFSR_LPAE		BIT(9)
+
+static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
+{
+	preempt_disable();
+	if (vcpu->arch.sysregs_loaded_on_cpu) {
+		kvm_arch_vcpu_put(vcpu);
+		return true;
+	}
+
+	preempt_enable();
+	return false;
+}
+
+static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded)
+{
+	if (loaded) {
+		kvm_arch_vcpu_load(vcpu, smp_processor_id());
+		preempt_enable();
+	}
+}
+
+static void inject_undef32(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_UND |
+			     KVM_ARM64_PENDING_EXCEPTION);
+}
+
+/*
+ * Modelled after TakeDataAbortException() and TakePrefetchAbortException
+ * pseudocode.
+ */
+static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
+			 unsigned long addr)
+{
+	u32 *far, *fsr;
+	bool is_lpae;
+	bool loaded;
+
+	loaded = pre_fault_synchronize(vcpu);
+
+	if (is_pabt) {
+		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_IABT |
+				     KVM_ARM64_PENDING_EXCEPTION);
+		far = &vcpu_cp15(vcpu, c6_IFAR);
+		fsr = &vcpu_cp15(vcpu, c5_IFSR);
+	} else { /* !iabt */
+		vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_DABT |
+				     KVM_ARM64_PENDING_EXCEPTION);
+		far = &vcpu_cp15(vcpu, c6_DFAR);
+		fsr = &vcpu_cp15(vcpu, c5_DFSR);
+	}
+
+	*far = addr;
+
+	/* Give the guest an IMPLEMENTATION DEFINED exception */
+	is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
+	if (is_lpae) {
+		*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
+	} else {
+		/* no need to shuffle FS[4] into DFSR[10] as its 0 */
+		*fsr = DFSR_FSC_EXTABT_nLPAE;
+	}
+
+	post_fault_synchronize(vcpu, loaded);
+}
+
 /**
  * kvm_inject_dabt - inject a data abort into the guest
  * @vcpu: The VCPU to receive the data abort
@@ -77,7 +146,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
 	if (vcpu_el1_is_32bit(vcpu))
-		kvm_inject_dabt32(vcpu, addr);
+		inject_abt32(vcpu, false, addr);
 	else
 		inject_abt64(vcpu, false, addr);
 }
@@ -93,7 +162,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 {
 	if (vcpu_el1_is_32bit(vcpu))
-		kvm_inject_pabt32(vcpu, addr);
+		inject_abt32(vcpu, true, addr);
 	else
 		inject_abt64(vcpu, true, addr);
 }
@@ -108,7 +177,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_el1_is_32bit(vcpu))
-		kvm_inject_undef32(vcpu);
+		inject_undef32(vcpu);
 	else
 		inject_undef64(vcpu);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 11/11] KVM: arm64: Get rid of the AArch32 register mapping code
  2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
                   ` (9 preceding siblings ...)
  2020-11-02 16:40 ` [PATCH v2 10/11] KVM: arm64: Consolidate exception injection Marc Zyngier
@ 2020-11-02 16:40 ` Marc Zyngier
  10 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2020-11-02 16:40 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Will Deacon, Mark Rutland, Quentin Perret, David Brazdil,
	kernel-team

The only use of the register mapping code was for the sake of the LR
mapping, which we trivially solved in a previous patch. Get rid of
the whole thing now.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   2 -
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/guest.c               |  28 +++++-
 arch/arm64/kvm/regmap.c              | 128 ---------------------------
 4 files changed, 26 insertions(+), 134 deletions(-)
 delete mode 100644 arch/arm64/kvm/regmap.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3105bb73f539..c8f550a53516 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,8 +33,6 @@ enum exception_type {
 	except_type_serror	= 0x180,
 };
 
-unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
-
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 9b32a89a25c8..60fd181df624 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_KVM) += hyp/
 kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 $(KVM)/vfio.o $(KVM)/irqchip.o \
 	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
-	 inject_fault.o regmap.o va_layout.o handle_exit.o \
+	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
 	 arch_timer.o \
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index dfb5218137ca..3f23f7478d2a 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -252,10 +252,32 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	memcpy(addr, valp, KVM_REG_SIZE(reg->id));
 
 	if (*vcpu_cpsr(vcpu) & PSR_MODE32_BIT) {
-		int i;
+		int i, nr_reg;
+
+		switch (*vcpu_cpsr(vcpu)) {
+		/*
+		 * Either we are dealing with user mode, and only the
+		 * first 15 registers (+ PC) must be narrowed to 32bit.
+		 * AArch32 r0-r14 conveniently map to AArch64 x0-x14.
+		 */
+		case PSR_AA32_MODE_USR:
+		case PSR_AA32_MODE_SYS:
+			nr_reg = 15;
+			break;
+
+		/*
+		 * Otherwide, this is a priviledged mode, and *all* the
+		 * registers must be narrowed to 32bit.
+		 */
+		default:
+			nr_reg = 31;
+			break;
+		}
+
+		for (i = 0; i < nr_reg; i++)
+			vcpu_set_reg(vcpu, i, (u32)vcpu_get_reg(vcpu, i));
 
-		for (i = 0; i < 16; i++)
-			*vcpu_reg32(vcpu, i) = (u32)*vcpu_reg32(vcpu, i);
+		*vcpu_pc(vcpu) = (u32)*vcpu_pc(vcpu);
 	}
 out:
 	return err;
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
deleted file mode 100644
index ae7e290bb017..000000000000
--- a/arch/arm64/kvm/regmap.c
+++ /dev/null
@@ -1,128 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 2012,2013 - ARM Ltd
- * Author: Marc Zyngier <marc.zyngier@arm.com>
- *
- * Derived from arch/arm/kvm/emulate.c:
- * Copyright (C) 2012 - Virtual Open Systems and Columbia University
- * Author: Christoffer Dall <c.dall@virtualopensystems.com>
- */
-
-#include <linux/mm.h>
-#include <linux/kvm_host.h>
-#include <asm/kvm_emulate.h>
-#include <asm/ptrace.h>
-
-#define VCPU_NR_MODES 6
-#define REG_OFFSET(_reg) \
-	(offsetof(struct user_pt_regs, _reg) / sizeof(unsigned long))
-
-#define USR_REG_OFFSET(R) REG_OFFSET(compat_usr(R))
-
-static const unsigned long vcpu_reg_offsets[VCPU_NR_MODES][16] = {
-	/* USR Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
-		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
-		USR_REG_OFFSET(12), USR_REG_OFFSET(13),	USR_REG_OFFSET(14),
-		REG_OFFSET(pc)
-	},
-
-	/* FIQ Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7),
-		REG_OFFSET(compat_r8_fiq),  /* r8 */
-		REG_OFFSET(compat_r9_fiq),  /* r9 */
-		REG_OFFSET(compat_r10_fiq), /* r10 */
-		REG_OFFSET(compat_r11_fiq), /* r11 */
-		REG_OFFSET(compat_r12_fiq), /* r12 */
-		REG_OFFSET(compat_sp_fiq),  /* r13 */
-		REG_OFFSET(compat_lr_fiq),  /* r14 */
-		REG_OFFSET(pc)
-	},
-
-	/* IRQ Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
-		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
-		USR_REG_OFFSET(12),
-		REG_OFFSET(compat_sp_irq), /* r13 */
-		REG_OFFSET(compat_lr_irq), /* r14 */
-		REG_OFFSET(pc)
-	},
-
-	/* SVC Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
-		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
-		USR_REG_OFFSET(12),
-		REG_OFFSET(compat_sp_svc), /* r13 */
-		REG_OFFSET(compat_lr_svc), /* r14 */
-		REG_OFFSET(pc)
-	},
-
-	/* ABT Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
-		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
-		USR_REG_OFFSET(12),
-		REG_OFFSET(compat_sp_abt), /* r13 */
-		REG_OFFSET(compat_lr_abt), /* r14 */
-		REG_OFFSET(pc)
-	},
-
-	/* UND Registers */
-	{
-		USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
-		USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
-		USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
-		USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
-		USR_REG_OFFSET(12),
-		REG_OFFSET(compat_sp_und), /* r13 */
-		REG_OFFSET(compat_lr_und), /* r14 */
-		REG_OFFSET(pc)
-	},
-};
-
-/*
- * Return a pointer to the register number valid in the current mode of
- * the virtual CPU.
- */
-unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
-{
-	unsigned long *reg_array = (unsigned long *)&vcpu->arch.ctxt.regs;
-	unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
-
-	switch (mode) {
-	case PSR_AA32_MODE_USR ... PSR_AA32_MODE_SVC:
-		mode &= ~PSR_MODE32_BIT; /* 0 ... 3 */
-		break;
-
-	case PSR_AA32_MODE_ABT:
-		mode = 4;
-		break;
-
-	case PSR_AA32_MODE_UND:
-		mode = 5;
-		break;
-
-	case PSR_AA32_MODE_SYS:
-		mode = 0;	/* SYS maps to USR */
-		break;
-
-	default:
-		BUG();
-	}
-
-	return reg_array + vcpu_reg_offsets[mode][reg_num];
-}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2020-11-02 16:40 ` [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP Marc Zyngier
@ 2021-05-05 14:23   ` Zenghui Yu
  2021-05-05 16:46     ` Marc Zyngier
  2021-05-06 17:17     ` Marc Zyngier
  0 siblings, 2 replies; 20+ messages in thread
From: Zenghui Yu @ 2021-05-05 14:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

Hi Marc,

On 2020/11/3 0:40, Marc Zyngier wrote:
> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> to increment PC post emulation is now signalled via a flag in the
> vcpu structure.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

[...]

> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	__load_guest_stage2(vcpu->arch.hw_mmu);
>  	__activate_traps(vcpu);
>  
> +	__adjust_pc(vcpu);

If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
entry point address, normally provided by the primary VCPU) will be
unexpectedly incremented here. That's pretty bad, I think.

This was noticed with a latest guest kernel, at least with commit
dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
stopped VCPUs in the WFx loop. The guest kernel shouted at me that

	"CPU: CPUs started in inconsistent modes"

*after* rebooting. The problem is that the secondary entry point was
corrupted by KVM as explained above. All of the secondary processors
started from set_cpu_boot_mode_flag(), with w0=0. Oh well...

I write the below diff and guess it will help. But I have to look at all
other places where we adjust PC directly to make a right fix. Please let
me know what do you think.


Thanks,
Zenghui

---->8----
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..ed647eb387c3 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
  		if (vcpu->arch.reset_state.be)
  			kvm_vcpu_set_be(vcpu);

+		/*
+		 * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
+		 * using this version of __adjust_pc().
+		 */
  		*vcpu_pc(vcpu) = target_pc;
+		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
  		vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0);

  		vcpu->arch.reset_state.reset = false;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-05 14:23   ` Zenghui Yu
@ 2021-05-05 16:46     ` Marc Zyngier
  2021-05-06  6:33       ` Marc Zyngier
  2021-05-06 17:17     ` Marc Zyngier
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-05-05 16:46 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

Hi Zenghui,

On Wed, 05 May 2021 15:23:02 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2020/11/3 0:40, Marc Zyngier wrote:
> > In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> > to increment PC post emulation is now signalled via a flag in the
> > vcpu structure.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> [...]
> 
> > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >  	__activate_traps(vcpu);
> >  +	__adjust_pc(vcpu);
> 
> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> entry point address, normally provided by the primary VCPU) will be
> unexpectedly incremented here. That's pretty bad, I think.

How can you online a CPU using PSCI if that CPU is currently spinning
on a WFI? Or is that we have transitioned via userspace to perform the
vcpu reset? I can imagine it happening in that case.

> This was noticed with a latest guest kernel, at least with commit
> dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
> stopped VCPUs in the WFx loop. The guest kernel shouted at me that
> 
> 	"CPU: CPUs started in inconsistent modes"

Ah, the perks of running guests with "quiet"... Well caught.

> *after* rebooting. The problem is that the secondary entry point was
> corrupted by KVM as explained above. All of the secondary processors
> started from set_cpu_boot_mode_flag(), with w0=0. Oh well...
> 
> I write the below diff and guess it will help. But I have to look at all
> other places where we adjust PC directly to make a right fix. Please let
> me know what do you think.
> 
> 
> Thanks,
> Zenghui
> 
> ---->8----
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 956cdc240148..ed647eb387c3 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		if (vcpu->arch.reset_state.be)
>  			kvm_vcpu_set_be(vcpu);
> 
> +		/*
> +		 * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
> +		 * using this version of __adjust_pc().
> +		 */
>  		*vcpu_pc(vcpu) = target_pc;
> +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;

I think you need to make it a lot stronger: any PC-altering flag will
do the wrong thing here. I'd go and clear all the exception bits:

Thanks,

	M.

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..54913612d602 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -265,6 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		if (vcpu->arch.reset_state.be)
 			kvm_vcpu_set_be(vcpu);
 
+		/*
+		 * We're reseting the CPU, make sure there is no
+		 * pending exception or other PC-altering event.
+		 */
+		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+				      KVM_ARM64_EXCEPT_MASK);
 		*vcpu_pc(vcpu) = target_pc;
 		vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0);
 

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-05 16:46     ` Marc Zyngier
@ 2021-05-06  6:33       ` Marc Zyngier
  2021-05-06 11:43         ` Zenghui Yu
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-05-06  6:33 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On Wed, 05 May 2021 17:46:51 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> Hi Zenghui,
> 
> On Wed, 05 May 2021 15:23:02 +0100,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> > 
> > Hi Marc,
> > 
> > On 2020/11/3 0:40, Marc Zyngier wrote:
> > > In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> > > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> > > to increment PC post emulation is now signalled via a flag in the
> > > vcpu structure.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > 
> > [...]
> > 
> > > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> > >  	__load_guest_stage2(vcpu->arch.hw_mmu);
> > >  	__activate_traps(vcpu);
> > >  +	__adjust_pc(vcpu);
> > 
> > If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> > handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> > entry point address, normally provided by the primary VCPU) will be
> > unexpectedly incremented here. That's pretty bad, I think.
> 
> How can you online a CPU using PSCI if that CPU is currently spinning
> on a WFI? Or is that we have transitioned via userspace to perform the
> vcpu reset? I can imagine it happening in that case.
> 
> > This was noticed with a latest guest kernel, at least with commit
> > dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
> > stopped VCPUs in the WFx loop. The guest kernel shouted at me that
> > 
> > 	"CPU: CPUs started in inconsistent modes"
> 
> Ah, the perks of running guests with "quiet"... Well caught.
> 
> > *after* rebooting. The problem is that the secondary entry point was
> > corrupted by KVM as explained above. All of the secondary processors
> > started from set_cpu_boot_mode_flag(), with w0=0. Oh well...
> > 
> > I write the below diff and guess it will help. But I have to look at all
> > other places where we adjust PC directly to make a right fix. Please let
> > me know what do you think.
> > 
> > 
> > Thanks,
> > Zenghui
> > 
> > ---->8----
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index 956cdc240148..ed647eb387c3 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >  		if (vcpu->arch.reset_state.be)
> >  			kvm_vcpu_set_be(vcpu);
> > 
> > +		/*
> > +		 * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
> > +		 * using this version of __adjust_pc().
> > +		 */
> >  		*vcpu_pc(vcpu) = target_pc;
> > +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;

Actually, this is far worse than it looks, and this only papers over
one particular symptom. We need to resolve all pending PC updates
*before* returning to userspace, or things like live migration can
observe an inconsistent state.

I'll try and cook something up.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-06  6:33       ` Marc Zyngier
@ 2021-05-06 11:43         ` Zenghui Yu
  2021-05-06 14:29           ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Zenghui Yu @ 2021-05-06 11:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On 2021/5/6 14:33, Marc Zyngier wrote:
> On Wed, 05 May 2021 17:46:51 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> Hi Zenghui,
>>
>> On Wed, 05 May 2021 15:23:02 +0100,
>> Zenghui Yu <yuzenghui@huawei.com> wrote:
>>>
>>> Hi Marc,
>>>
>>> On 2020/11/3 0:40, Marc Zyngier wrote:
>>>> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
>>>> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
>>>> to increment PC post emulation is now signalled via a flag in the
>>>> vcpu structure.
>>>>
>>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>>
>>> [...]
>>>
>>>> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>>  	__load_guest_stage2(vcpu->arch.hw_mmu);
>>>>  	__activate_traps(vcpu);
>>>> +	__adjust_pc(vcpu);
>>>
>>> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
>>> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
>>> entry point address, normally provided by the primary VCPU) will be
>>> unexpectedly incremented here. That's pretty bad, I think.
>>
>> How can you online a CPU using PSCI if that CPU is currently spinning
>> on a WFI? Or is that we have transitioned via userspace to perform the
>> vcpu reset? I can imagine it happening in that case.

I hadn't tried to reset VCPU from userspace. That would be a much easier
way to reproduce this problem.

>>> This was noticed with a latest guest kernel, at least with commit
>>> dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
>>> stopped VCPUs in the WFx loop. The guest kernel shouted at me that
>>>
>>> 	"CPU: CPUs started in inconsistent modes"
>>
>> Ah, the perks of running guests with "quiet"... Well caught.
>>
>>> *after* rebooting. The problem is that the secondary entry point was
>>> corrupted by KVM as explained above. All of the secondary processors
>>> started from set_cpu_boot_mode_flag(), with w0=0. Oh well...
>>>
>>> I write the below diff and guess it will help. But I have to look at all
>>> other places where we adjust PC directly to make a right fix. Please let
>>> me know what do you think.
>>>
>>>
>>> Thanks,
>>> Zenghui
>>>
>>> ---->8----
>>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>>> index 956cdc240148..ed647eb387c3 100644
>>> --- a/arch/arm64/kvm/reset.c
>>> +++ b/arch/arm64/kvm/reset.c
>>> @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>>>  		if (vcpu->arch.reset_state.be)
>>>  			kvm_vcpu_set_be(vcpu);
>>>
>>> +		/*
>>> +		 * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
>>> +		 * using this version of __adjust_pc().
>>> +		 */
>>>  		*vcpu_pc(vcpu) = target_pc;
>>> +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> 
> Actually, this is far worse than it looks, and this only papers over
> one particular symptom. We need to resolve all pending PC updates
> *before* returning to userspace, or things like live migration can
> observe an inconsistent state.

Ah yeah, agreed.

Apart from the PC manipulation, I noticed that when handling the user
GET_VCPU_EVENTS request:

|	/*
|	 * We never return a pending ext_dabt here because we deliver it to
|	 * the virtual CPU directly when setting the event and it's no longer
|	 * 'pending' at this point.
|	 */

Which isn't true anymore now that we defer the exception injection right
before the VCPU entry. The comment needs to be updated anyway whilst it
isn't clear to me that whether we should expose the in-kernel ext_dabt
to userspace, given that the exception state is already reflected by the
registers and the abort can be taken in the next VCPU entry (if we can
appropriately fix the PC updating problem).

> I'll try and cook something up.

Thanks.


Zenghui

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-06 11:43         ` Zenghui Yu
@ 2021-05-06 14:29           ` Marc Zyngier
  2021-05-09 13:07             ` Zenghui Yu
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-05-06 14:29 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On Thu, 06 May 2021 12:43:26 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> On 2021/5/6 14:33, Marc Zyngier wrote:
> > On Wed, 05 May 2021 17:46:51 +0100,
> > Marc Zyngier <maz@kernel.org> wrote:
> >> 
> >> Hi Zenghui,
> >> 
> >> On Wed, 05 May 2021 15:23:02 +0100,
> >> Zenghui Yu <yuzenghui@huawei.com> wrote:
> >>> 
> >>> Hi Marc,
> >>> 
> >>> On 2020/11/3 0:40, Marc Zyngier wrote:
> >>>> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> >>>> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> >>>> to increment PC post emulation is now signalled via a flag in the
> >>>> vcpu structure.
> >>>> 
> >>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> >>> 
> >>> [...]
> >>> 
> >>>> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >>>>  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >>>>  	__activate_traps(vcpu);
> >>>> +	__adjust_pc(vcpu);
> >>> 
> >>> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> >>> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> >>> entry point address, normally provided by the primary VCPU) will be
> >>> unexpectedly incremented here. That's pretty bad, I think.
> >> 
> >> How can you online a CPU using PSCI if that CPU is currently spinning
> >> on a WFI? Or is that we have transitioned via userspace to perform the
> >> vcpu reset? I can imagine it happening in that case.
> 
> I hadn't tried to reset VCPU from userspace. That would be a much easier
> way to reproduce this problem.

Then I don't understand how you end-up there. If the vcpu was in WFI,
it wasn't off and PSCI_CPU_ON doesn't have any effect.

> > Actually, this is far worse than it looks, and this only papers over
> > one particular symptom. We need to resolve all pending PC updates
> > *before* returning to userspace, or things like live migration can
> > observe an inconsistent state.
> 
> Ah yeah, agreed.
> 
> Apart from the PC manipulation, I noticed that when handling the user
> GET_VCPU_EVENTS request:
> 
> |	/*
> |	 * We never return a pending ext_dabt here because we deliver it to
> |	 * the virtual CPU directly when setting the event and it's no longer
> |	 * 'pending' at this point.
> |	 */
> 
> Which isn't true anymore now that we defer the exception injection right
> before the VCPU entry.

I believe the comment will be valid again once I fix the core issue,
which is that we shouldn't return to userspace with pending PC
adjustments. As long as KVM_GET_VCPU_EVENTS isn't issued on a running
vcpu (which looks pointless to me), this should be just fine.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-05 14:23   ` Zenghui Yu
  2021-05-05 16:46     ` Marc Zyngier
@ 2021-05-06 17:17     ` Marc Zyngier
  1 sibling, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2021-05-06 17:17 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On Wed, 05 May 2021 15:23:02 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2020/11/3 0:40, Marc Zyngier wrote:
> > In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> > to increment PC post emulation is now signalled via a flag in the
> > vcpu structure.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> [...]
> 
> > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >  	__activate_traps(vcpu);
> >  +	__adjust_pc(vcpu);
> 
> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> entry point address, normally provided by the primary VCPU) will be
> unexpectedly incremented here. That's pretty bad, I think.
> 
> This was noticed with a latest guest kernel, at least with commit
> dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
> stopped VCPUs in the WFx loop. The guest kernel shouted at me that
> 
> 	"CPU: CPUs started in inconsistent modes"
> 
> *after* rebooting. The problem is that the secondary entry point was
> corrupted by KVM as explained above. All of the secondary processors
> started from set_cpu_boot_mode_flag(), with w0=0. Oh well...

FWIW, I've pushed out a test branch[1] with two patches that sit on
top of Linus' current tree. I'll rebase and post it as soon as -rc1
appears, but I'd appreciate if you could have a look in the meantime.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pc-fixes

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-06 14:29           ` Marc Zyngier
@ 2021-05-09 13:07             ` Zenghui Yu
  2021-05-10  7:59               ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Zenghui Yu @ 2021-05-09 13:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On 2021/5/6 22:29, Marc Zyngier wrote:
> On Thu, 06 May 2021 12:43:26 +0100,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
>>
>> On 2021/5/6 14:33, Marc Zyngier wrote:
>>> On Wed, 05 May 2021 17:46:51 +0100,
>>> Marc Zyngier <maz@kernel.org> wrote:
>>>>
>>>> Hi Zenghui,
>>>>
>>>> On Wed, 05 May 2021 15:23:02 +0100,
>>>> Zenghui Yu <yuzenghui@huawei.com> wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> On 2020/11/3 0:40, Marc Zyngier wrote:
>>>>>> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
>>>>>> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
>>>>>> to increment PC post emulation is now signalled via a flag in the
>>>>>> vcpu structure.
>>>>>>
>>>>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>>>>
>>>>> [...]
>>>>>
>>>>>> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>>>>>>  	__load_guest_stage2(vcpu->arch.hw_mmu);
>>>>>>  	__activate_traps(vcpu);
>>>>>> +	__adjust_pc(vcpu);
>>>>>
>>>>> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
>>>>> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
>>>>> entry point address, normally provided by the primary VCPU) will be
>>>>> unexpectedly incremented here. That's pretty bad, I think.
>>>>
>>>> How can you online a CPU using PSCI if that CPU is currently spinning
>>>> on a WFI? Or is that we have transitioned via userspace to perform the
>>>> vcpu reset? I can imagine it happening in that case.
>>
>> I hadn't tried to reset VCPU from userspace. That would be a much easier
>> way to reproduce this problem.
> 
> Then I don't understand how you end-up there. If the vcpu was in WFI,
> it wasn't off and PSCI_CPU_ON doesn't have any effect.

I'm sorry for the misleading words.

The reported problem (secondary vcpu entry point corruption) was noticed
after a guest reboot. On rebooting, all vcpus will go back to userspace,
either because of a vcpu PSCI_SYSTEM_RESET call (with a
KVM_SYSTEM_EVENT_RESET system event in result), or because of a pending
signal targetting the vcpu thread. Userspace (I used QEMU) will then
perform the vcpu reset using the KVM_ARM_VCPU_INIT ioctl, of course!

WFI is the last instruction executed by the secondary vcpu before
rebooting. Emulating it results in a PC-altering flag.

What I was going to say is that maybe we can reproduce this problem with
a much simpler userspace program (not QEMU, no reboot) -- perform vcpu
reset while the vcpu is concurrently executing WFI, and see if the
result PC is set to 0 (per the KVM_ARM_VCPU_INIT doc). Maybe we can
achieve it with a kvm selftest case but "I hadn't tried", which turned
out to be misleading.

I'll have a look at your branch.


Thanks,
Zenghui

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
  2021-05-09 13:07             ` Zenghui Yu
@ 2021-05-10  7:59               ` Marc Zyngier
  0 siblings, 0 replies; 20+ messages in thread
From: Marc Zyngier @ 2021-05-10  7:59 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-arm-kernel, kvmarm, kvm, kernel-team, Will Deacon,
	James Morse, Julien Thierry, Suzuki K Poulose, Andrew Scull,
	Mark Rutland, Quentin Perret, David Brazdil

On Sun, 09 May 2021 14:07:45 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> On 2021/5/6 22:29, Marc Zyngier wrote:
> > On Thu, 06 May 2021 12:43:26 +0100,
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> >> 
> >> On 2021/5/6 14:33, Marc Zyngier wrote:
> >>> On Wed, 05 May 2021 17:46:51 +0100,
> >>> Marc Zyngier <maz@kernel.org> wrote:
> >>>> 
> >>>> Hi Zenghui,
> >>>> 
> >>>> On Wed, 05 May 2021 15:23:02 +0100,
> >>>> Zenghui Yu <yuzenghui@huawei.com> wrote:
> >>>>> 
> >>>>> Hi Marc,
> >>>>> 
> >>>>> On 2020/11/3 0:40, Marc Zyngier wrote:
> >>>>>> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> >>>>>> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> >>>>>> to increment PC post emulation is now signalled via a flag in the
> >>>>>> vcpu structure.
> >>>>>> 
> >>>>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> >>>>> 
> >>>>> [...]
> >>>>> 
> >>>>>> @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >>>>>>  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >>>>>>  	__activate_traps(vcpu);
> >>>>>> +	__adjust_pc(vcpu);
> >>>>> 
> >>>>> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> >>>>> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> >>>>> entry point address, normally provided by the primary VCPU) will be
> >>>>> unexpectedly incremented here. That's pretty bad, I think.
> >>>> 
> >>>> How can you online a CPU using PSCI if that CPU is currently spinning
> >>>> on a WFI? Or is that we have transitioned via userspace to perform the
> >>>> vcpu reset? I can imagine it happening in that case.
> >> 
> >> I hadn't tried to reset VCPU from userspace. That would be a much easier
> >> way to reproduce this problem.
> > 
> > Then I don't understand how you end-up there. If the vcpu was in WFI,
> > it wasn't off and PSCI_CPU_ON doesn't have any effect.
> 
> I'm sorry for the misleading words.
> 
> The reported problem (secondary vcpu entry point corruption) was noticed
> after a guest reboot. On rebooting, all vcpus will go back to userspace,
> either because of a vcpu PSCI_SYSTEM_RESET call (with a
> KVM_SYSTEM_EVENT_RESET system event in result), or because of a pending
> signal targetting the vcpu thread. Userspace (I used QEMU) will then
> perform the vcpu reset using the KVM_ARM_VCPU_INIT ioctl, of course!

OK, that's exactly the scenario I had in mind, and we are in violent
agreement! :-)

> WFI is the last instruction executed by the secondary vcpu before
> rebooting. Emulating it results in a PC-altering flag.
> 
> What I was going to say is that maybe we can reproduce this problem with
> a much simpler userspace program (not QEMU, no reboot) -- perform vcpu
> reset while the vcpu is concurrently executing WFI, and see if the
> result PC is set to 0 (per the KVM_ARM_VCPU_INIT doc). Maybe we can
> achieve it with a kvm selftest case but "I hadn't tried", which turned
> out to be misleading.

No worries. At least I know we have the same understanding of the
problem and we can look at the solution.

> I'll have a look at your branch.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-10  8:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 16:40 [PATCH v2 00/11] KVM: arm64: Move PC/ELR/SPSR/PSTATE updatess to EL2 Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 02/11] KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32() Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP Marc Zyngier
2021-05-05 14:23   ` Zenghui Yu
2021-05-05 16:46     ` Marc Zyngier
2021-05-06  6:33       ` Marc Zyngier
2021-05-06 11:43         ` Zenghui Yu
2021-05-06 14:29           ` Marc Zyngier
2021-05-09 13:07             ` Zenghui Yu
2021-05-10  7:59               ` Marc Zyngier
2021-05-06 17:17     ` Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 04/11] KVM: arm64: Move PC rollback on SError " Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 05/11] KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 06/11] KVM: arm64: Add basic hooks for injecting exceptions from EL2 Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 07/11] KVM: arm64: Inject AArch64 exceptions from HYP Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 08/11] KVM: arm64: Inject AArch32 " Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 09/11] KVM: arm64: Remove SPSR manipulation primitives Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 10/11] KVM: arm64: Consolidate exception injection Marc Zyngier
2020-11-02 16:40 ` [PATCH v2 11/11] KVM: arm64: Get rid of the AArch32 register mapping code Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).