[PATCH 0/2] KVM: arm64: Fixup PC updates on exit to userspace

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] KVM: arm64: Fixup PC updates on exit to userspace
@ 2021-05-10  9:49 Marc Zyngier
  2021-05-10  9:49 ` [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line Marc Zyngier
  2021-05-10  9:49 ` [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace Marc Zyngier
  0 siblings, 2 replies; 14+ messages in thread
From: Marc Zyngier @ 2021-05-10  9:49 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei, kernel-team

We recently moved anything related to PC updates into the guest entry
code to help with the protected KVM effort. However, I missed a
critical point: userspace needs to be able to observe state changes
when the vcpu exits. Otherwise, live migration is a bit broken and
vcpu reset can fail (as reported by Zenghui). Not good.

These two patches aim at fixing the above, and carry a Cc stable.

Marc Zyngier (2):
  KVM: arm64: Move __adjust_pc out of line
  KVM: arm64: Commit pending PC adjustemnts before returning to
    userspace

 arch/arm64/include/asm/kvm_asm.h           |  3 +++
 arch/arm64/kvm/arm.c                       | 10 ++++++++++
 arch/arm64/kvm/hyp/exception.c             | 18 +++++++++++++++++-
 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 18 ------------------
 arch/arm64/kvm/hyp/nvhe/hyp-main.c         |  8 ++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c           |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c            |  2 +-
 7 files changed, 40 insertions(+), 21 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line
  2021-05-10  9:49 [PATCH 0/2] KVM: arm64: Fixup PC updates on exit to userspace Marc Zyngier
@ 2021-05-10  9:49 ` Marc Zyngier
  2021-05-10 14:55   ` Alexandru Elisei
  2021-05-10  9:49 ` [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace Marc Zyngier
  1 sibling, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2021-05-10  9:49 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

In order to make it easy to call __adjust_pc() from the EL1 code
(in the case of nVHE), rename it to __kvm_adjust_pc() and move
it out of line.

No expected functional change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org # 5.11
---
 arch/arm64/include/asm/kvm_asm.h           |  2 ++
 arch/arm64/kvm/hyp/exception.c             | 18 +++++++++++++++++-
 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 18 ------------------
 arch/arm64/kvm/hyp/nvhe/switch.c           |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c            |  2 +-
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index cf8df032b9c3..d5b11037401d 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -201,6 +201,8 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
+extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
+
 extern u64 __vgic_v3_get_gic_config(void);
 extern u64 __vgic_v3_read_vmcr(void);
 extern void __vgic_v3_write_vmcr(u32 vmcr);
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 73629094f903..0812a496725f 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -296,7 +296,7 @@ static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 	*vcpu_pc(vcpu) = vect_offset;
 }
 
-void kvm_inject_exception(struct kvm_vcpu *vcpu)
+static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_el1_is_32bit(vcpu)) {
 		switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
@@ -329,3 +329,19 @@ void kvm_inject_exception(struct kvm_vcpu *vcpu)
 		}
 	}
 }
+
+/*
+ * Adjust the guest PC on entry, depending on flags provided by EL1
+ * for the purpose of emulation (MMIO, sysreg) or exception injection.
+ */
+void __kvm_adjust_pc(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
+		kvm_inject_exception(vcpu);
+		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+				      KVM_ARM64_EXCEPT_MASK);
+	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
+		kvm_skip_instr(vcpu);
+		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
+	}
+}
diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
index 61716359035d..4fdfeabefeb4 100644
--- a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
+++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
@@ -13,8 +13,6 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
 
-void kvm_inject_exception(struct kvm_vcpu *vcpu);
-
 static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu)) {
@@ -43,22 +41,6 @@ static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
 	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
 }
 
-/*
- * Adjust the guest PC on entry, depending on flags provided by EL1
- * for the purpose of emulation (MMIO, sysreg) or exception injection.
- */
-static inline void __adjust_pc(struct kvm_vcpu *vcpu)
-{
-	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
-		kvm_inject_exception(vcpu);
-		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
-				      KVM_ARM64_EXCEPT_MASK);
-	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
-		kvm_skip_instr(vcpu);
-		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
-	}
-}
-
 /*
  * Skip an instruction while host sysregs are live.
  * Assumes host is always 64-bit.
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index e9f6ea704d07..b8ac123c3419 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -201,7 +201,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	__debug_save_host_buffers_nvhe(vcpu);
 
-	__adjust_pc(vcpu);
+	__kvm_adjust_pc(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 7b8f7db5c1ed..3eafed0431f5 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -132,7 +132,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__load_guest_stage2(vcpu->arch.hw_mmu);
 	__activate_traps(vcpu);
 
-	__adjust_pc(vcpu);
+	__kvm_adjust_pc(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10  9:49 [PATCH 0/2] KVM: arm64: Fixup PC updates on exit to userspace Marc Zyngier
  2021-05-10  9:49 ` [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line Marc Zyngier
@ 2021-05-10  9:49 ` Marc Zyngier
  2021-05-10 14:55   ` Alexandru Elisei
  2021-05-11  8:03   ` Fuad Tabba
  1 sibling, 2 replies; 14+ messages in thread
From: Marc Zyngier @ 2021-05-10  9:49 UTC (permalink / raw)
  To: kvm, kvmarm, linux-arm-kernel
  Cc: Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

KVM currently updates PC (and the corresponding exception state)
using a two phase approach: first by setting a set of flags,
then by converting these flags into a state update when the vcpu
is about to enter the guest.

However, this creates a disconnect with userspace if the vcpu thread
returns there with any exception/PC flag set. In this case, the exposed
context is wrong, as userpsace doesn't have access to these flags
(they aren't architectural). It also means that these flags are
preserved across a reset, which isn't expected.

To solve this problem, force an explicit synchronisation of the
exception state on vcpu exit to userspace. As an optimisation
for nVHE systems, only perform this when there is something pending.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org # 5.11
---
 arch/arm64/include/asm/kvm_asm.h   |  1 +
 arch/arm64/kvm/arm.c               | 10 ++++++++++
 arch/arm64/kvm/hyp/exception.c     |  4 ++--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b11037401d..5e9b33cbac51 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -63,6 +63,7 @@
 #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
 #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
 #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
+#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..d62a7041ebd1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 	kvm_sigset_deactivate(vcpu);
 
+	/*
+	 * In the unlikely event that we are returning to userspace
+	 * with pending exceptions or PC adjustment, commit these
+	 * adjustments in order to give userspace a consistent view of
+	 * the vcpu state.
+	 */
+	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
+					 KVM_ARM64_EXCEPT_MASK)))
+		kvm_call_hyp(__kvm_adjust_pc, vcpu);
+
 	vcpu_put(vcpu);
 	return ret;
 }
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 0812a496725f..11541b94b328 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -331,8 +331,8 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Adjust the guest PC on entry, depending on flags provided by EL1
- * for the purpose of emulation (MMIO, sysreg) or exception injection.
+ * Adjust the guest PC (and potentially exception state) depending on
+ * flags provided by the emulation code.
  */
 void __kvm_adjust_pc(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f36420a80474..1632f001f4ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -28,6 +28,13 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
 }
 
+static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+
+	__kvm_adjust_pc(kern_hyp_va(vcpu));
+}
+
 static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt)
 {
 	__kvm_flush_vm_context();
@@ -170,6 +177,7 @@ typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_vcpu_run),
+	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_flush_vm_context),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
 	HANDLE_FUNC(__kvm_tlb_flush_vmid),
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line
  2021-05-10  9:49 ` [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line Marc Zyngier
@ 2021-05-10 14:55   ` Alexandru Elisei
  2021-05-10 15:08     ` Marc Zyngier
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandru Elisei @ 2021-05-10 14:55 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Zenghui Yu, James Morse, Suzuki K Poulose, kernel-team, stable

Hi Marc,

On 5/10/21 10:49 AM, Marc Zyngier wrote:
> In order to make it easy to call __adjust_pc() from the EL1 code
> (in the case of nVHE), rename it to __kvm_adjust_pc() and move
> it out of line.
>
> No expected functional change.

It does look to me like they're functionally identical. Minor comments below.

>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org # 5.11
> ---
>  arch/arm64/include/asm/kvm_asm.h           |  2 ++
>  arch/arm64/kvm/hyp/exception.c             | 18 +++++++++++++++++-
>  arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 18 ------------------
>  arch/arm64/kvm/hyp/nvhe/switch.c           |  2 +-
>  arch/arm64/kvm/hyp/vhe/switch.c            |  2 +-
>  5 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index cf8df032b9c3..d5b11037401d 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -201,6 +201,8 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> +extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);

It looks pretty strange to have the file
arch/arm64/kvm/hyp/include/hyp/adjust_pc.h, but the function __kvm_adjust_pc() in
another header. I guess this was done because arch/arm64/kvm/arm.c will use the
function in the next patch. I was thinking that maybe renaming
adjust_pc.h->skip_instr.h would make more sense, what do you think? I can send a
patch on top of this series with the rename if you prefer.

> +
>  extern u64 __vgic_v3_get_gic_config(void);
>  extern u64 __vgic_v3_read_vmcr(void);
>  extern void __vgic_v3_write_vmcr(u32 vmcr);
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 73629094f903..0812a496725f 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -296,7 +296,7 @@ static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
>  	*vcpu_pc(vcpu) = vect_offset;
>  }
>  
> -void kvm_inject_exception(struct kvm_vcpu *vcpu)
> +static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu_el1_is_32bit(vcpu)) {
>  		switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
> @@ -329,3 +329,19 @@ void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  }
> +
> +/*
> + * Adjust the guest PC on entry, depending on flags provided by EL1

This is also called by the VHE code running at EL2, but the comment is reworded in
the next patch, so it doesn't really matter, and keeping the diff a straight move
makes it easier to read.

> + * for the purpose of emulation (MMIO, sysreg) or exception injection.
> + */
> +void __kvm_adjust_pc(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
> +		kvm_inject_exception(vcpu);
> +		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
> +				      KVM_ARM64_EXCEPT_MASK);
> +	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
> +		kvm_skip_instr(vcpu);
> +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> +	}
> +}
> diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> index 61716359035d..4fdfeabefeb4 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> @@ -13,8 +13,6 @@
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
>  
> -void kvm_inject_exception(struct kvm_vcpu *vcpu);
> -
>  static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu_mode_is_32bit(vcpu)) {
> @@ -43,22 +41,6 @@ static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
>  	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>  }
>  
> -/*
> - * Adjust the guest PC on entry, depending on flags provided by EL1
> - * for the purpose of emulation (MMIO, sysreg) or exception injection.
> - */
> -static inline void __adjust_pc(struct kvm_vcpu *vcpu)
> -{
> -	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
> -		kvm_inject_exception(vcpu);
> -		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
> -				      KVM_ARM64_EXCEPT_MASK);
> -	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
> -		kvm_skip_instr(vcpu);
> -		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> -	}
> -}
> -
>  /*
>   * Skip an instruction while host sysregs are live.
>   * Assumes host is always 64-bit.
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index e9f6ea704d07..b8ac123c3419 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -201,7 +201,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	 */
>  	__debug_save_host_buffers_nvhe(vcpu);
>  
> -	__adjust_pc(vcpu);
> +	__kvm_adjust_pc(vcpu);
>  
>  	/*
>  	 * We must restore the 32-bit state before the sysregs, thanks
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 7b8f7db5c1ed..3eafed0431f5 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -132,7 +132,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	__load_guest_stage2(vcpu->arch.hw_mmu);
>  	__activate_traps(vcpu);
>  
> -	__adjust_pc(vcpu);
> +	__kvm_adjust_pc(vcpu);

With the function now moved to kvm_asm.h, the header include adjust_pc.h is not
needed. Same for the nvhe version of switch.c.

Thanks,

Alex

>  
>  	sysreg_restore_guest_state_vhe(guest_ctxt);
>  	__debug_switch_to_guest(vcpu);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10  9:49 ` [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace Marc Zyngier
@ 2021-05-10 14:55   ` Alexandru Elisei
  2021-05-10 15:04     ` Marc Zyngier
  2021-05-11  8:03   ` Fuad Tabba
  1 sibling, 1 reply; 14+ messages in thread
From: Alexandru Elisei @ 2021-05-10 14:55 UTC (permalink / raw)
  To: Marc Zyngier, kvm, kvmarm, linux-arm-kernel
  Cc: Zenghui Yu, James Morse, Suzuki K Poulose, kernel-team, stable

Hi Marc,

On 5/10/21 10:49 AM, Marc Zyngier wrote:
> KVM currently updates PC (and the corresponding exception state)
> using a two phase approach: first by setting a set of flags,
> then by converting these flags into a state update when the vcpu
> is about to enter the guest.
>
> However, this creates a disconnect with userspace if the vcpu thread
> returns there with any exception/PC flag set. In this case, the exposed

The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION flag. Is the "PC
flag" a reference to the KVM_ARM64_INCREMENT_PC flag?

> context is wrong, as userpsace doesn't have access to these flags

s/userpsace/userspace

> (they aren't architectural). It also means that these flags are
> preserved across a reset, which isn't expected.
>
> To solve this problem, force an explicit synchronisation of the
> exception state on vcpu exit to userspace. As an optimisation
> for nVHE systems, only perform this when there is something pending.
>
> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org # 5.11
> ---
>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>  arch/arm64/kvm/arm.c               | 10 ++++++++++
>  arch/arm64/kvm/hyp/exception.c     |  4 ++--
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
>  4 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b11037401d..5e9b33cbac51 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -63,6 +63,7 @@
>  #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
>  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
>  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
> +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1cb39c0803a4..d62a7041ebd1 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  
>  	kvm_sigset_deactivate(vcpu);
>  
> +	/*
> +	 * In the unlikely event that we are returning to userspace
> +	 * with pending exceptions or PC adjustment, commit these

I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC flag. Please
correct me if that's not true, but if that's the case, then the flag isn't handled
below.

> +	 * adjustments in order to give userspace a consistent view of
> +	 * the vcpu state.
> +	 */
> +	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
> +					 KVM_ARM64_EXCEPT_MASK)))

The condition seems to suggest that it is valid to set
KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting KVM_ARM64_PENDING_EXCEPTION, which
looks rather odd to me. Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If
it's not (the existing code always sets the exception type with the
KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only the
KVM_ARM64_PENDING_EXCEPTION flag would make the intention clearer.

Thanks,

Alex

> +		kvm_call_hyp(__kvm_adjust_pc, vcpu);
> +
>  	vcpu_put(vcpu);
>  	return ret;
>  }
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 0812a496725f..11541b94b328 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -331,8 +331,8 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> - * Adjust the guest PC on entry, depending on flags provided by EL1
> - * for the purpose of emulation (MMIO, sysreg) or exception injection.
> + * Adjust the guest PC (and potentially exception state) depending on
> + * flags provided by the emulation code.
>   */
>  void __kvm_adjust_pc(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index f36420a80474..1632f001f4ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -28,6 +28,13 @@ static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
>  	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
>  }
>  
> +static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
> +
> +	__kvm_adjust_pc(kern_hyp_va(vcpu));
> +}
> +
>  static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt)
>  {
>  	__kvm_flush_vm_context();
> @@ -170,6 +177,7 @@ typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__kvm_vcpu_run),
> +	HANDLE_FUNC(__kvm_adjust_pc),
>  	HANDLE_FUNC(__kvm_flush_vm_context),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
>  	HANDLE_FUNC(__kvm_tlb_flush_vmid),

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10 14:55   ` Alexandru Elisei
@ 2021-05-10 15:04     ` Marc Zyngier
  2021-05-10 15:14       ` Alexandru Elisei
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2021-05-10 15:04 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, kvmarm, linux-arm-kernel, Zenghui Yu, James Morse,
	Suzuki K Poulose, kernel-team, stable

On Mon, 10 May 2021 15:55:28 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 5/10/21 10:49 AM, Marc Zyngier wrote:
> > KVM currently updates PC (and the corresponding exception state)
> > using a two phase approach: first by setting a set of flags,
> > then by converting these flags into a state update when the vcpu
> > is about to enter the guest.
> >
> > However, this creates a disconnect with userspace if the vcpu thread
> > returns there with any exception/PC flag set. In this case, the exposed
> 
> The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION
> flag. Is the "PC flag" a reference to the KVM_ARM64_INCREMENT_PC
> flag?

No, it does handle both exception and PC increment, unless I have
completely bodged something (entirely possible).

> 
> > context is wrong, as userpsace doesn't have access to these flags
> 
> s/userpsace/userspace
> 
> > (they aren't architectural). It also means that these flags are
> > preserved across a reset, which isn't expected.
> >
> > To solve this problem, force an explicit synchronisation of the
> > exception state on vcpu exit to userspace. As an optimisation
> > for nVHE systems, only perform this when there is something pending.
> >
> > Reported-by: Zenghui Yu <yuzenghui@huawei.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org # 5.11
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   |  1 +
> >  arch/arm64/kvm/arm.c               | 10 ++++++++++
> >  arch/arm64/kvm/hyp/exception.c     |  4 ++--
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
> >  4 files changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index d5b11037401d..5e9b33cbac51 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -63,6 +63,7 @@
> >  #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
> >  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
> >  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
> > +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
> >  
> >  #ifndef __ASSEMBLY__
> >  
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 1cb39c0803a4..d62a7041ebd1 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >  
> >  	kvm_sigset_deactivate(vcpu);
> >  
> > +	/*
> > +	 * In the unlikely event that we are returning to userspace
> > +	 * with pending exceptions or PC adjustment, commit these
> 
> I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC
> flag. Please correct me if that's not true, but if that's the case,
> then the flag isn't handled below.
> 
> > +	 * adjustments in order to give userspace a consistent view of
> > +	 * the vcpu state.
> > +	 */
> > +	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
> > +					 KVM_ARM64_EXCEPT_MASK)))
> 
> The condition seems to suggest that it is valid to set
> KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting
> KVM_ARM64_PENDING_EXCEPTION, which looks rather odd to me.
> Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If it's not
> (the existing code always sets the exception type with the
> KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only
> the KVM_ARM64_PENDING_EXCEPTION flag would make the intention
> clearer.

No, you are missing this (subtle) comment in kvm_host.h:

<quote>
/*
 * Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
 * set together with an exception...
 */
#define KVM_ARM64_INCREMENT_PC		(1 << 9) /* Increment PC */
</quote>

So (KVM_ARM64_PENDING_EXCEPTION | KVM_ARM64_EXCEPT_MASK) checks for
*both* an exception and a PC increment.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line
  2021-05-10 14:55   ` Alexandru Elisei
@ 2021-05-10 15:08     ` Marc Zyngier
  0 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2021-05-10 15:08 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, kvmarm, linux-arm-kernel, Zenghui Yu, James Morse,
	Suzuki K Poulose, kernel-team, stable

On Mon, 10 May 2021 15:55:16 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 5/10/21 10:49 AM, Marc Zyngier wrote:
> > In order to make it easy to call __adjust_pc() from the EL1 code
> > (in the case of nVHE), rename it to __kvm_adjust_pc() and move
> > it out of line.
> >
> > No expected functional change.
> 
> It does look to me like they're functionally identical. Minor comments below.
> 
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org # 5.11
> > ---
> >  arch/arm64/include/asm/kvm_asm.h           |  2 ++
> >  arch/arm64/kvm/hyp/exception.c             | 18 +++++++++++++++++-
> >  arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 18 ------------------
> >  arch/arm64/kvm/hyp/nvhe/switch.c           |  2 +-
> >  arch/arm64/kvm/hyp/vhe/switch.c            |  2 +-
> >  5 files changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index cf8df032b9c3..d5b11037401d 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -201,6 +201,8 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> >  
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_adjust_pc(struct kvm_vcpu *vcpu);
> 
> It looks pretty strange to have the file
> arch/arm64/kvm/hyp/include/hyp/adjust_pc.h, but the function
> __kvm_adjust_pc() in another header. I guess this was done because
> arch/arm64/kvm/arm.c will use the function in the next patch.

That's mostly it. __kvm_adjust_pc() is very similar to
__kvm_vcpu_run() in its usage, and I want to keep the patch as small
as possible given that this is a candidate for a stable backport.

> I was thinking that maybe renaming adjust_pc.h->skip_instr.h would
> make more sense, what do you think? I can send a patch on top of
> this series with the rename if you prefer.

Yes, that'd probably be a good cleanup now that __adjust_pc() is gone.

> 
> > +
> >  extern u64 __vgic_v3_get_gic_config(void);
> >  extern u64 __vgic_v3_read_vmcr(void);
> >  extern void __vgic_v3_write_vmcr(u32 vmcr);
> > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> > index 73629094f903..0812a496725f 100644
> > --- a/arch/arm64/kvm/hyp/exception.c
> > +++ b/arch/arm64/kvm/hyp/exception.c
> > @@ -296,7 +296,7 @@ static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
> >  	*vcpu_pc(vcpu) = vect_offset;
> >  }
> >  
> > -void kvm_inject_exception(struct kvm_vcpu *vcpu)
> > +static void kvm_inject_exception(struct kvm_vcpu *vcpu)
> >  {
> >  	if (vcpu_el1_is_32bit(vcpu)) {
> >  		switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
> > @@ -329,3 +329,19 @@ void kvm_inject_exception(struct kvm_vcpu *vcpu)
> >  		}
> >  	}
> >  }
> > +
> > +/*
> > + * Adjust the guest PC on entry, depending on flags provided by EL1
> 
> This is also called by the VHE code running at EL2, but the comment is reworded in
> the next patch, so it doesn't really matter, and keeping the diff a straight move
> makes it easier to read.
> 
> > + * for the purpose of emulation (MMIO, sysreg) or exception injection.
> > + */
> > +void __kvm_adjust_pc(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
> > +		kvm_inject_exception(vcpu);
> > +		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
> > +				      KVM_ARM64_EXCEPT_MASK);
> > +	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
> > +		kvm_skip_instr(vcpu);
> > +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> > +	}
> > +}
> > diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> > index 61716359035d..4fdfeabefeb4 100644
> > --- a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> > +++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h
> > @@ -13,8 +13,6 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_host.h>
> >  
> > -void kvm_inject_exception(struct kvm_vcpu *vcpu);
> > -
> >  static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
> >  {
> >  	if (vcpu_mode_is_32bit(vcpu)) {
> > @@ -43,22 +41,6 @@ static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
> >  	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
> >  }
> >  
> > -/*
> > - * Adjust the guest PC on entry, depending on flags provided by EL1
> > - * for the purpose of emulation (MMIO, sysreg) or exception injection.
> > - */
> > -static inline void __adjust_pc(struct kvm_vcpu *vcpu)
> > -{
> > -	if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
> > -		kvm_inject_exception(vcpu);
> > -		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
> > -				      KVM_ARM64_EXCEPT_MASK);
> > -	} else 	if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
> > -		kvm_skip_instr(vcpu);
> > -		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> > -	}
> > -}
> > -
> >  /*
> >   * Skip an instruction while host sysregs are live.
> >   * Assumes host is always 64-bit.
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index e9f6ea704d07..b8ac123c3419 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -201,7 +201,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	 */
> >  	__debug_save_host_buffers_nvhe(vcpu);
> >  
> > -	__adjust_pc(vcpu);
> > +	__kvm_adjust_pc(vcpu);
> >  
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> > index 7b8f7db5c1ed..3eafed0431f5 100644
> > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > @@ -132,7 +132,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >  	__activate_traps(vcpu);
> >  
> > -	__adjust_pc(vcpu);
> > +	__kvm_adjust_pc(vcpu);
> 
> With the function now moved to kvm_asm.h, the header include
> adjust_pc.h is not needed. Same for the nvhe version of switch.c.

Well spotted. I'll clean that up.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10 15:04     ` Marc Zyngier
@ 2021-05-10 15:14       ` Alexandru Elisei
  2021-05-11  7:44         ` Marc Zyngier
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandru Elisei @ 2021-05-10 15:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Zenghui Yu, James Morse,
	Suzuki K Poulose, kernel-team, stable

Hi Marc,

On 5/10/21 4:04 PM, Marc Zyngier wrote:
> On Mon, 10 May 2021 15:55:28 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> On 5/10/21 10:49 AM, Marc Zyngier wrote:
>>> KVM currently updates PC (and the corresponding exception state)
>>> using a two phase approach: first by setting a set of flags,
>>> then by converting these flags into a state update when the vcpu
>>> is about to enter the guest.
>>>
>>> However, this creates a disconnect with userspace if the vcpu thread
>>> returns there with any exception/PC flag set. In this case, the exposed
>> The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION
>> flag. Is the "PC flag" a reference to the KVM_ARM64_INCREMENT_PC
>> flag?
> No, it does handle both exception and PC increment, unless I have
> completely bodged something (entirely possible).

The message is correct, my bad.

>
>>> context is wrong, as userpsace doesn't have access to these flags
>> s/userpsace/userspace
>>
>>> (they aren't architectural). It also means that these flags are
>>> preserved across a reset, which isn't expected.
>>>
>>> To solve this problem, force an explicit synchronisation of the
>>> exception state on vcpu exit to userspace. As an optimisation
>>> for nVHE systems, only perform this when there is something pending.
>>>
>>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>> Cc: stable@vger.kernel.org # 5.11
>>> ---
>>>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>>>  arch/arm64/kvm/arm.c               | 10 ++++++++++
>>>  arch/arm64/kvm/hyp/exception.c     |  4 ++--
>>>  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
>>>  4 files changed, 21 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>>> index d5b11037401d..5e9b33cbac51 100644
>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>> @@ -63,6 +63,7 @@
>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
>>> +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
>>>  
>>>  #ifndef __ASSEMBLY__
>>>  
>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>> index 1cb39c0803a4..d62a7041ebd1 100644
>>> --- a/arch/arm64/kvm/arm.c
>>> +++ b/arch/arm64/kvm/arm.c
>>> @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>>  
>>>  	kvm_sigset_deactivate(vcpu);
>>>  
>>> +	/*
>>> +	 * In the unlikely event that we are returning to userspace
>>> +	 * with pending exceptions or PC adjustment, commit these
>> I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC
>> flag. Please correct me if that's not true, but if that's the case,
>> then the flag isn't handled below.
>>
>>> +	 * adjustments in order to give userspace a consistent view of
>>> +	 * the vcpu state.
>>> +	 */
>>> +	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
>>> +					 KVM_ARM64_EXCEPT_MASK)))
>> The condition seems to suggest that it is valid to set
>> KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting
>> KVM_ARM64_PENDING_EXCEPTION, which looks rather odd to me.
>> Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If it's not
>> (the existing code always sets the exception type with the
>> KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only
>> the KVM_ARM64_PENDING_EXCEPTION flag would make the intention
>> clearer.
> No, you are missing this (subtle) comment in kvm_host.h:
>
> <quote>
> /*
>  * Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
>  * set together with an exception...
>  */
> #define KVM_ARM64_INCREMENT_PC		(1 << 9) /* Increment PC */
> </quote>
>
> So (KVM_ARM64_PENDING_EXCEPTION | KVM_ARM64_EXCEPT_MASK) checks for
> *both* an exception and a PC increment.

Then how about explicitly checking for the KVM_ARM64_PENDING_EXCEPTION and
KVM_ARM64_INCREMENT_PC flags, like it's done in __kvm_adjust_pc? That would
certainly make the code easier to understand, as it's not immediately obvious that
the EXCEPT mask includes the INCREMENT_PC flag.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10 15:14       ` Alexandru Elisei
@ 2021-05-11  7:44         ` Marc Zyngier
  2021-05-11 16:50           ` Alexandru Elisei
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2021-05-11  7:44 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, kvmarm, linux-arm-kernel, Zenghui Yu, James Morse,
	Suzuki K Poulose, kernel-team, stable

On Mon, 10 May 2021 16:14:37 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 5/10/21 4:04 PM, Marc Zyngier wrote:
> > On Mon, 10 May 2021 15:55:28 +0100,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >> Hi Marc,
> >>
> >> On 5/10/21 10:49 AM, Marc Zyngier wrote:
> >>> KVM currently updates PC (and the corresponding exception state)
> >>> using a two phase approach: first by setting a set of flags,
> >>> then by converting these flags into a state update when the vcpu
> >>> is about to enter the guest.
> >>>
> >>> However, this creates a disconnect with userspace if the vcpu thread
> >>> returns there with any exception/PC flag set. In this case, the exposed
> >> The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION
> >> flag. Is the "PC flag" a reference to the KVM_ARM64_INCREMENT_PC
> >> flag?
> > No, it does handle both exception and PC increment, unless I have
> > completely bodged something (entirely possible).
> 
> The message is correct, my bad.
> 
> >
> >>> context is wrong, as userpsace doesn't have access to these flags
> >> s/userpsace/userspace
> >>
> >>> (they aren't architectural). It also means that these flags are
> >>> preserved across a reset, which isn't expected.
> >>>
> >>> To solve this problem, force an explicit synchronisation of the
> >>> exception state on vcpu exit to userspace. As an optimisation
> >>> for nVHE systems, only perform this when there is something pending.
> >>>
> >>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
> >>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> >>> Cc: stable@vger.kernel.org # 5.11
> >>> ---
> >>>  arch/arm64/include/asm/kvm_asm.h   |  1 +
> >>>  arch/arm64/kvm/arm.c               | 10 ++++++++++
> >>>  arch/arm64/kvm/hyp/exception.c     |  4 ++--
> >>>  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
> >>>  4 files changed, 21 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> >>> index d5b11037401d..5e9b33cbac51 100644
> >>> --- a/arch/arm64/include/asm/kvm_asm.h
> >>> +++ b/arch/arm64/include/asm/kvm_asm.h
> >>> @@ -63,6 +63,7 @@
> >>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
> >>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
> >>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
> >>> +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
> >>>  
> >>>  #ifndef __ASSEMBLY__
> >>>  
> >>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>> index 1cb39c0803a4..d62a7041ebd1 100644
> >>> --- a/arch/arm64/kvm/arm.c
> >>> +++ b/arch/arm64/kvm/arm.c
> >>> @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >>>  
> >>>  	kvm_sigset_deactivate(vcpu);
> >>>  
> >>> +	/*
> >>> +	 * In the unlikely event that we are returning to userspace
> >>> +	 * with pending exceptions or PC adjustment, commit these
> >> I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC
> >> flag. Please correct me if that's not true, but if that's the case,
> >> then the flag isn't handled below.
> >>
> >>> +	 * adjustments in order to give userspace a consistent view of
> >>> +	 * the vcpu state.
> >>> +	 */
> >>> +	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
> >>> +					 KVM_ARM64_EXCEPT_MASK)))
> >> The condition seems to suggest that it is valid to set
> >> KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting
> >> KVM_ARM64_PENDING_EXCEPTION, which looks rather odd to me.
> >> Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If it's not
> >> (the existing code always sets the exception type with the
> >> KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only
> >> the KVM_ARM64_PENDING_EXCEPTION flag would make the intention
> >> clearer.
> > No, you are missing this (subtle) comment in kvm_host.h:
> >
> > <quote>
> > /*
> >  * Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
> >  * set together with an exception...
> >  */
> > #define KVM_ARM64_INCREMENT_PC		(1 << 9) /* Increment PC */
> > </quote>
> >
> > So (KVM_ARM64_PENDING_EXCEPTION | KVM_ARM64_EXCEPT_MASK) checks for
> > *both* an exception and a PC increment.
> 
> Then how about explicitly checking for the
> KVM_ARM64_PENDING_EXCEPTION and KVM_ARM64_INCREMENT_PC flags, like
> it's done in __kvm_adjust_pc? That would certainly make the code
> easier to understand, as it's not immediately obvious that the
> EXCEPT mask includes the INCREMENT_PC flag.

Fair enough. I'll fix that in v2.

Another thing I wondered about: we now rely on __kvm_adjust_pc() to be
preemption safe. That's always the case in nVHE (we're at EL2), but
VHE can be preempted at any point. The code we call is preemption
safe, but it takes some effort to be convinced of it.

Do you have a good suggestion on how to express this requirement?  I
could throw a preempt_disable()/enable() at the call for the sake of
being in the same context between VHE and nVHE, but that's not
strictly necessary for now.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-10  9:49 ` [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace Marc Zyngier
  2021-05-10 14:55   ` Alexandru Elisei
@ 2021-05-11  8:03   ` Fuad Tabba
  2021-05-11  8:14     ` Marc Zyngier
  1 sibling, 1 reply; 14+ messages in thread
From: Fuad Tabba @ 2021-05-11  8:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64),
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

Hi Marc,

> KVM: arm64: Commit pending PC adjustemnts before returning to userspace

s/adjustments/adjustments

On Mon, May 10, 2021 at 10:49 AM Marc Zyngier <maz@kernel.org> wrote:
>
> KVM currently updates PC (and the corresponding exception state)
> using a two phase approach: first by setting a set of flags,
> then by converting these flags into a state update when the vcpu
> is about to enter the guest.
>
> However, this creates a disconnect with userspace if the vcpu thread
> returns there with any exception/PC flag set. In this case, the exposed
> context is wrong, as userpsace doesn't have access to these flags
> (they aren't architectural). It also means that these flags are
> preserved across a reset, which isn't expected.
>
> To solve this problem, force an explicit synchronisation of the
> exception state on vcpu exit to userspace. As an optimisation
> for nVHE systems, only perform this when there is something pending.

I've tested this with a few nvhe and vhe tests that exercise both
__kvm_adjust_pc call paths (__kvm_vcpu_run and
kvm_arch_vcpu_ioctl_run), and the tests ran as expected.  I'll do the
same for v2 when you send it out.

Cheers,
/fuad

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-11  8:03   ` Fuad Tabba
@ 2021-05-11  8:14     ` Marc Zyngier
  2021-05-11  8:22       ` Fuad Tabba
  0 siblings, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2021-05-11  8:14 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64),
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

Hi Fuad,

On Tue, 11 May 2021 09:03:40 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi Marc,
> 
> > KVM: arm64: Commit pending PC adjustemnts before returning to userspace
> 
> s/adjustments/adjustments

Looks like Gmail refuses to let you mimic my spelling mistakes! :D

> 
> On Mon, May 10, 2021 at 10:49 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > KVM currently updates PC (and the corresponding exception state)
> > using a two phase approach: first by setting a set of flags,
> > then by converting these flags into a state update when the vcpu
> > is about to enter the guest.
> >
> > However, this creates a disconnect with userspace if the vcpu thread
> > returns there with any exception/PC flag set. In this case, the exposed
> > context is wrong, as userpsace doesn't have access to these flags
> > (they aren't architectural). It also means that these flags are
> > preserved across a reset, which isn't expected.
> >
> > To solve this problem, force an explicit synchronisation of the
> > exception state on vcpu exit to userspace. As an optimisation
> > for nVHE systems, only perform this when there is something pending.
> 
> I've tested this with a few nvhe and vhe tests that exercise both
> __kvm_adjust_pc call paths (__kvm_vcpu_run and
> kvm_arch_vcpu_ioctl_run), and the tests ran as expected.  I'll do the
> same for v2 when you send it out.

Ah, that's interesting. Do you have tests that actually fail when
hitting this bug? Given that this is pretty subtle, it'd be good to
have a way to make sure it doesn't crop up again.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-11  8:14     ` Marc Zyngier
@ 2021-05-11  8:22       ` Fuad Tabba
  2021-05-11  8:45         ` Marc Zyngier
  0 siblings, 1 reply; 14+ messages in thread
From: Fuad Tabba @ 2021-05-11  8:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64),
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

Hi Marc,


On Tue, May 11, 2021 at 9:14 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Fuad,
>
> On Tue, 11 May 2021 09:03:40 +0100,
> Fuad Tabba <tabba@google.com> wrote:
> >
> > Hi Marc,
> >
> > > KVM: arm64: Commit pending PC adjustemnts before returning to userspace
> >
> > s/adjustments/adjustments
>
> Looks like Gmail refuses to let you mimic my spelling mistakes! :D
>
> >
> > On Mon, May 10, 2021 at 10:49 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > KVM currently updates PC (and the corresponding exception state)
> > > using a two phase approach: first by setting a set of flags,
> > > then by converting these flags into a state update when the vcpu
> > > is about to enter the guest.
> > >
> > > However, this creates a disconnect with userspace if the vcpu thread
> > > returns there with any exception/PC flag set. In this case, the exposed
> > > context is wrong, as userpsace doesn't have access to these flags
> > > (they aren't architectural). It also means that these flags are
> > > preserved across a reset, which isn't expected.
> > >
> > > To solve this problem, force an explicit synchronisation of the
> > > exception state on vcpu exit to userspace. As an optimisation
> > > for nVHE systems, only perform this when there is something pending.
> >
> > I've tested this with a few nvhe and vhe tests that exercise both
> > __kvm_adjust_pc call paths (__kvm_vcpu_run and
> > kvm_arch_vcpu_ioctl_run), and the tests ran as expected.  I'll do the
> > same for v2 when you send it out.
>
> Ah, that's interesting. Do you have tests that actually fail when
> hitting this bug? Given that this is pretty subtle, it'd be good to
> have a way to make sure it doesn't crop up again.

Nothing that fails, just code that generates exceptions or emulates
instructions at various points. That said, I think it should be
straightforward to write a selftest for this. I'll give it a go.

/fuad

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-11  8:22       ` Fuad Tabba
@ 2021-05-11  8:45         ` Marc Zyngier
  0 siblings, 0 replies; 14+ messages in thread
From: Marc Zyngier @ 2021-05-11  8:45 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvm, open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64),
	moderated list:ARM64 PORT (AARCH64 ARCHITECTURE),
	Zenghui Yu, James Morse, Suzuki K Poulose, Alexandru Elisei,
	kernel-team, stable

On Tue, 11 May 2021 09:22:37 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi Marc,
> 
> 
> On Tue, May 11, 2021 at 9:14 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Fuad,
> >
> > On Tue, 11 May 2021 09:03:40 +0100,
> > Fuad Tabba <tabba@google.com> wrote:
> > >
> > > Hi Marc,
> > >
> > > > KVM: arm64: Commit pending PC adjustemnts before returning to userspace
> > >
> > > s/adjustments/adjustments
> >
> > Looks like Gmail refuses to let you mimic my spelling mistakes! :D
> >
> > >
> > > On Mon, May 10, 2021 at 10:49 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > KVM currently updates PC (and the corresponding exception state)
> > > > using a two phase approach: first by setting a set of flags,
> > > > then by converting these flags into a state update when the vcpu
> > > > is about to enter the guest.
> > > >
> > > > However, this creates a disconnect with userspace if the vcpu thread
> > > > returns there with any exception/PC flag set. In this case, the exposed
> > > > context is wrong, as userpsace doesn't have access to these flags
> > > > (they aren't architectural). It also means that these flags are
> > > > preserved across a reset, which isn't expected.
> > > >
> > > > To solve this problem, force an explicit synchronisation of the
> > > > exception state on vcpu exit to userspace. As an optimisation
> > > > for nVHE systems, only perform this when there is something pending.
> > >
> > > I've tested this with a few nvhe and vhe tests that exercise both
> > > __kvm_adjust_pc call paths (__kvm_vcpu_run and
> > > kvm_arch_vcpu_ioctl_run), and the tests ran as expected.  I'll do the
> > > same for v2 when you send it out.
> >
> > Ah, that's interesting. Do you have tests that actually fail when
> > hitting this bug? Given that this is pretty subtle, it'd be good to
> > have a way to make sure it doesn't crop up again.
> 
> Nothing that fails, just code that generates exceptions or emulates
> instructions at various points. That said, I think it should be
> straightforward to write a selftest for this. I'll give it a go.

PC adjustment is easy-ish: have a vcpu to hit WFI with no interrupt
pending, send the thread a signal to make it exit to userspace, update
the PC to another address, and check that the instruction at that
address is actually executed.

Exception injection is a lot more difficult: you need to force a vcpu
exit to userspace right after having caused an exception to be
injected by KVM. I can't think of an easy way to do that other than
repeatedly executing an instruction that generates an exception while
signalling the thread to force the exit. Ugly.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace
  2021-05-11  7:44         ` Marc Zyngier
@ 2021-05-11 16:50           ` Alexandru Elisei
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandru Elisei @ 2021-05-11 16:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, linux-arm-kernel, Zenghui Yu, James Morse,
	Suzuki K Poulose, kernel-team, stable

Hi Marc,

On 5/11/21 8:44 AM, Marc Zyngier wrote:
> On Mon, 10 May 2021 16:14:37 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> On 5/10/21 4:04 PM, Marc Zyngier wrote:
>>> On Mon, 10 May 2021 15:55:28 +0100,
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>> Hi Marc,
>>>>
>>>> On 5/10/21 10:49 AM, Marc Zyngier wrote:
>>>>> KVM currently updates PC (and the corresponding exception state)
>>>>> using a two phase approach: first by setting a set of flags,
>>>>> then by converting these flags into a state update when the vcpu
>>>>> is about to enter the guest.
>>>>>
>>>>> However, this creates a disconnect with userspace if the vcpu thread
>>>>> returns there with any exception/PC flag set. In this case, the exposed
>>>> The code seems to handle only the KVM_ARM64_PENDING_EXCEPTION
>>>> flag. Is the "PC flag" a reference to the KVM_ARM64_INCREMENT_PC
>>>> flag?
>>> No, it does handle both exception and PC increment, unless I have
>>> completely bodged something (entirely possible).
>> The message is correct, my bad.
>>
>>>>> context is wrong, as userpsace doesn't have access to these flags
>>>> s/userpsace/userspace
>>>>
>>>>> (they aren't architectural). It also means that these flags are
>>>>> preserved across a reset, which isn't expected.
>>>>>
>>>>> To solve this problem, force an explicit synchronisation of the
>>>>> exception state on vcpu exit to userspace. As an optimisation
>>>>> for nVHE systems, only perform this when there is something pending.
>>>>>
>>>>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
>>>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>>>> Cc: stable@vger.kernel.org # 5.11
>>>>> ---
>>>>>  arch/arm64/include/asm/kvm_asm.h   |  1 +
>>>>>  arch/arm64/kvm/arm.c               | 10 ++++++++++
>>>>>  arch/arm64/kvm/hyp/exception.c     |  4 ++--
>>>>>  arch/arm64/kvm/hyp/nvhe/hyp-main.c |  8 ++++++++
>>>>>  4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>>>>> index d5b11037401d..5e9b33cbac51 100644
>>>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>>>> @@ -63,6 +63,7 @@
>>>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector		18
>>>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize		19
>>>>>  #define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp			20
>>>>> +#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc			21
>>>>>  
>>>>>  #ifndef __ASSEMBLY__
>>>>>  
>>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>>> index 1cb39c0803a4..d62a7041ebd1 100644
>>>>> --- a/arch/arm64/kvm/arm.c
>>>>> +++ b/arch/arm64/kvm/arm.c
>>>>> @@ -897,6 +897,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>>>>>  
>>>>>  	kvm_sigset_deactivate(vcpu);
>>>>>  
>>>>> +	/*
>>>>> +	 * In the unlikely event that we are returning to userspace
>>>>> +	 * with pending exceptions or PC adjustment, commit these
>>>> I'm going to assume "PC adjustment" means the KVM_ARM64_INCREMENT_PC
>>>> flag. Please correct me if that's not true, but if that's the case,
>>>> then the flag isn't handled below.
>>>>
>>>>> +	 * adjustments in order to give userspace a consistent view of
>>>>> +	 * the vcpu state.
>>>>> +	 */
>>>>> +	if (unlikely(vcpu->arch.flags & (KVM_ARM64_PENDING_EXCEPTION |
>>>>> +					 KVM_ARM64_EXCEPT_MASK)))
>>>> The condition seems to suggest that it is valid to set
>>>> KVM_ARM64_EXCEPT_{AA32,AA64}_* without setting
>>>> KVM_ARM64_PENDING_EXCEPTION, which looks rather odd to me.
>>>> Is that a valid use of the KVM_ARM64_EXCEPT_MASK bits? If it's not
>>>> (the existing code always sets the exception type with the
>>>> KVM_ARM64_PENDING_EXCEPTION), that I was thinking that checking only
>>>> the KVM_ARM64_PENDING_EXCEPTION flag would make the intention
>>>> clearer.
>>> No, you are missing this (subtle) comment in kvm_host.h:
>>>
>>> <quote>
>>> /*
>>>  * Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
>>>  * set together with an exception...
>>>  */
>>> #define KVM_ARM64_INCREMENT_PC		(1 << 9) /* Increment PC */
>>> </quote>
>>>
>>> So (KVM_ARM64_PENDING_EXCEPTION | KVM_ARM64_EXCEPT_MASK) checks for
>>> *both* an exception and a PC increment.
>> Then how about explicitly checking for the
>> KVM_ARM64_PENDING_EXCEPTION and KVM_ARM64_INCREMENT_PC flags, like
>> it's done in __kvm_adjust_pc? That would certainly make the code
>> easier to understand, as it's not immediately obvious that the
>> EXCEPT mask includes the INCREMENT_PC flag.
> Fair enough. I'll fix that in v2.
>
> Another thing I wondered about: we now rely on __kvm_adjust_pc() to be
> preemption safe. That's always the case in nVHE (we're at EL2), but
> VHE can be preempted at any point. The code we call is preemption
> safe, but it takes some effort to be convinced of it.
>
> Do you have a good suggestion on how to express this requirement?  I
> could throw a preempt_disable()/enable() at the call for the sake of
> being in the same context between VHE and nVHE, but that's not
> strictly necessary for now.

I've had another look at it, and it does look to me that __kvm_adjust_pc() is safe
to call with preemption enabled on VHE systems. I was a bit worried when I saw
that it touches some EL1 registers directly, but I then I realized that
kvm_arch_vcpu_put/load() will make sure that the values of those registers are
correct.

I do agree that it's not immediately obvious that __kvm_adjust_pc() is called with
preemption enabled. I'm a bit worried that someone will change something and make
the function unsafe from preemption. Since the patch already touches the comment
for __kvm_adjust_pc(), maybe mention that it can be called from contexts with
preemption enabled and why it's safe to do it? Something along the lines of "The
function is called with preemption enabled, and it's safe to do so because even
though it touches the hardware EL1 registers in the VHE case, those registers are
saved and restored by a vcpu_put/load pair. In the NVHE case, the code is running
at EL2 and it cannot be preempted by the host". Or something else that you fancy.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-11 16:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10  9:49 [PATCH 0/2] KVM: arm64: Fixup PC updates on exit to userspace Marc Zyngier
2021-05-10  9:49 ` [PATCH 1/2] KVM: arm64: Move __adjust_pc out of line Marc Zyngier
2021-05-10 14:55   ` Alexandru Elisei
2021-05-10 15:08     ` Marc Zyngier
2021-05-10  9:49 ` [PATCH 2/2] KVM: arm64: Commit pending PC adjustemnts before returning to userspace Marc Zyngier
2021-05-10 14:55   ` Alexandru Elisei
2021-05-10 15:04     ` Marc Zyngier
2021-05-10 15:14       ` Alexandru Elisei
2021-05-11  7:44         ` Marc Zyngier
2021-05-11 16:50           ` Alexandru Elisei
2021-05-11  8:03   ` Fuad Tabba
2021-05-11  8:14     ` Marc Zyngier
2021-05-11  8:22       ` Fuad Tabba
2021-05-11  8:45         ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).