linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3
@ 2019-09-26 21:43 Sean Christopherson
  2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sean Christopherson @ 2019-09-26 21:43 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki

Reto Buerki reported a failure in a nested VMM when running with HLT
interception disabled in L1.  When putting L2 into HLT, KVM never actually
enters L2 and instead cancels the nested run and pretends that VM-Enter to
L2 completed and then exited on HLT (which KVM intercepted).  Because KVM
never actually runs L2, KVM skips the pending MMU update for L2 and so
leaves a stale value in vmcs02.GUEST_CR3.  If the next wake event for L2
triggers a nested VM-Exit, KVM will refresh vmcs12->guest_cr3 from
vmcs02.GUEST_CR3 and consume the stale value.

Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
nested code fully responsible for vmcs02.GUEST_CR3.

I really wanted to go with a different fix of handling this as a one-off
case in the HLT flow (in nested_vmx_run()), and then following that up
with a cleanup of VMX's CR3 handling, e.g. to do proper dirty tracking
instead of having the nested code do manual VMREADs and VMWRITEs.  I even
went so far as to hide vcpu->arch.cr3 (put CR3 in vcpu->arch.regs), but
things went south when I started working through the dirty tracking logic.

Because EPT can be enabled *without* unrestricted guest, enabling EPT
doesn't always mean GUEST_CR3 really is the guest CR3 (unlike SVM's NPT).
And because the unrestricted guest handling of GUEST_CR3 is dependent on
whether the guest has paging enabled, VMX can't even do a clean handoff
based on unrestricted guest.  In a nutshell, dynamically handling the
transitions of GUEST_CR3 ownership in VMX is a nightmare, so fixing this
purely within the context of nested VMX turned out to be the cleanest fix.

Sean Christopherson (2):
  KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date

 arch/x86/kvm/vmx/nested.c |  8 ++++++++
 arch/x86/kvm/vmx/vmx.c    | 15 ++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-26 21:43 [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Sean Christopherson
@ 2019-09-26 21:43 ` Sean Christopherson
  2019-09-26 23:39   ` Jim Mattson
  2019-09-27  0:06   ` Liran Alon
  2019-09-26 21:43 ` [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Sean Christopherson @ 2019-09-26 21:43 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki

Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
isntead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
VM-Exit occurs without actually entering L2, e.g. if the nested run
is squashed because L2 is being put into HLT.

In an ideal world where EPT *requires* unrestricted guest (and vice
versa), VMX could handle CR3 similar to how it handles RSP and RIP,
e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
the unrestricted guest silliness complicates the dirty tracking logic
to the point that explicitly handling vmcs02.GUEST_CR3 during nested
VM-Enter is a simpler overall implementation.

Cc: stable@vger.kernel.org
Reported-by: Reto Buerki <reet@codelabs.ch>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/nested.c | 8 ++++++++
 arch/x86/kvm/vmx/vmx.c    | 9 ++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 41abc62c9a8a..971a24134081 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 				entry_failure_code))
 		return -EINVAL;
 
+	/*
+	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
+	 * on nested VM-Exit, which can occur without actually running L2, e.g.
+	 * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
+	 */
+	if (enable_ept)
+		vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
+
 	/* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
 	if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
 	    is_pae_paging(vcpu)) {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d4575ffb3cec..b530950a9c2b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long guest_cr3;
+	bool skip_cr3 = false;
 	u64 eptp;
 
 	guest_cr3 = cr3;
@@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
 		}
 
-		if (enable_unrestricted_guest || is_paging(vcpu) ||
-		    is_guest_mode(vcpu))
+		if (is_guest_mode(vcpu))
+			skip_cr3 = true;
+		else if (enable_unrestricted_guest || is_paging(vcpu))
 			guest_cr3 = kvm_read_cr3(vcpu);
 		else
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
 		ept_load_pdptrs(vcpu);
 	}
 
-	vmcs_writel(GUEST_CR3, guest_cr3);
+	if (!skip_cr3)
+		vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
 int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
  2019-09-26 21:43 [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Sean Christopherson
  2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
@ 2019-09-26 21:43 ` Sean Christopherson
  2019-09-27 12:11   ` Vitaly Kuznetsov
  2019-09-27  7:45 ` [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Reto Buerki
  2019-09-27 12:12 ` Vitaly Kuznetsov
  3 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2019-09-26 21:43 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Reto Buerki

Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has
not been read from the VMCS since the last VM-Enter.  If vcpu->arch.cr3
is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS,
meaning KVM will do a VMREAD and then VMWRITE the value it just pulled
from the VMCS.

Note, this is a purely theoretical change, no instances of skipping
the VMREAD+VMWRITE have been observed with this change.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b530950a9c2b..6de09f60edf3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3003,10 +3003,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 		if (is_guest_mode(vcpu))
 			skip_cr3 = true;
-		else if (enable_unrestricted_guest || is_paging(vcpu))
-			guest_cr3 = kvm_read_cr3(vcpu);
-		else
+		else if (!enable_unrestricted_guest && !is_paging(vcpu))
 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
+		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
+			guest_cr3 = vcpu->arch.cr3;
+		else
+			skip_cr3 = true; /* vmcs01.GUEST_CR3 is up-to-date. */
 		ept_load_pdptrs(vcpu);
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
@ 2019-09-26 23:39   ` Jim Mattson
  2019-09-27 14:22     ` Sean Christopherson
  2019-09-27  0:06   ` Liran Alon
  1 sibling, 1 reply; 13+ messages in thread
From: Jim Mattson @ 2019-09-26 23:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm list, LKML,
	Reto Buerki

On Thu, Sep 26, 2019 at 2:43 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
> isntead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
> is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
> refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
> VM-Exit occurs without actually entering L2, e.g. if the nested run
> is squashed because L2 is being put into HLT.
>
> In an ideal world where EPT *requires* unrestricted guest (and vice
> versa), VMX could handle CR3 similar to how it handles RSP and RIP,
> e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
> the unrestricted guest silliness complicates the dirty tracking logic
> to the point that explicitly handling vmcs02.GUEST_CR3 during nested
> VM-Enter is a simpler overall implementation.
>
> Cc: stable@vger.kernel.org
> Reported-by: Reto Buerki <reet@codelabs.ch>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 8 ++++++++
>  arch/x86/kvm/vmx/vmx.c    | 9 ++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 41abc62c9a8a..971a24134081 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>                                 entry_failure_code))
>                 return -EINVAL;
>
> +       /*
> +        * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
> +        * on nested VM-Exit, which can occur without actually running L2, e.g.
> +        * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
> +        */
> +       if (enable_ept)
> +               vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
> +

This part of the change I can definitely confirm, since we have a
similar change in our fetid pile of commits that have yet to be
upstreamed. Thanks for taking care of this one!

>         /* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
>         if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
>             is_pae_paging(vcpu)) {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d4575ffb3cec..b530950a9c2b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
>         struct kvm *kvm = vcpu->kvm;
>         unsigned long guest_cr3;
> +       bool skip_cr3 = false;
>         u64 eptp;
>
>         guest_cr3 = cr3;
> @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>                         spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>                 }
>
> -               if (enable_unrestricted_guest || is_paging(vcpu) ||
> -                   is_guest_mode(vcpu))
> +               if (is_guest_mode(vcpu))
> +                       skip_cr3 = true;
> +               else if (enable_unrestricted_guest || is_paging(vcpu))
>                         guest_cr3 = kvm_read_cr3(vcpu);
>                 else
>                         guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
>                 ept_load_pdptrs(vcpu);
>         }
>
> -       vmcs_writel(GUEST_CR3, guest_cr3);
> +       if (!skip_cr3)
> +               vmcs_writel(GUEST_CR3, guest_cr3);

Is this part of the change necessary, or is it just an optimization to
save a redundant VMWRITE?

>  }
>
>  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> --
> 2.22.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
  2019-09-26 23:39   ` Jim Mattson
@ 2019-09-27  0:06   ` Liran Alon
  2019-09-27 14:27     ` Sean Christopherson
  1 sibling, 1 reply; 13+ messages in thread
From: Liran Alon @ 2019-09-27  0:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki



> On 27 Sep 2019, at 0:43, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
> isntead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
> is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
> refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
> VM-Exit occurs without actually entering L2, e.g. if the nested run
> is squashed because L2 is being put into HLT.

I would rephrase to “If an emulated VMEntry is squashed because L1 sets vmcs12->guest_activity_state to HLT”.
I think it’s a bit more explicit.

> 
> In an ideal world where EPT *requires* unrestricted guest (and vice
> versa), VMX could handle CR3 similar to how it handles RSP and RIP,
> e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
> the unrestricted guest silliness complicates the dirty tracking logic
> to the point that explicitly handling vmcs02.GUEST_CR3 during nested
> VM-Enter is a simpler overall implementation.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Reto Buerki <reet@codelabs.ch>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 8 ++++++++
> arch/x86/kvm/vmx/vmx.c    | 9 ++++++---
> 2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 41abc62c9a8a..971a24134081 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> 				entry_failure_code))
> 		return -EINVAL;
> 
> +	/*
> +	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
> +	 * on nested VM-Exit, which can occur without actually running L2, e.g.
> +	 * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
> +	 */

If I understand correctly, it’s not exactly if L2 is entering HLT state in general.
(E.g. issue doesn’t occur if L2 runs HLT directly which is not configured to be intercepted by vmcs12).
It’s specifically when L1 enters L2 with a HLT guest-activity-state. I suggest rephrasing comment.

> +	if (enable_ept)
> +		vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
> +
> 	/* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
> 	if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
> 	    is_pae_paging(vcpu)) {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d4575ffb3cec..b530950a9c2b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> {
> 	struct kvm *kvm = vcpu->kvm;
> 	unsigned long guest_cr3;
> +	bool skip_cr3 = false;
> 	u64 eptp;
> 
> 	guest_cr3 = cr3;
> @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> 		}
> 
> -		if (enable_unrestricted_guest || is_paging(vcpu) ||
> -		    is_guest_mode(vcpu))
> +		if (is_guest_mode(vcpu))
> +			skip_cr3 = true;
> +		else if (enable_unrestricted_guest || is_paging(vcpu))
> 			guest_cr3 = kvm_read_cr3(vcpu);
> 		else
> 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> 		ept_load_pdptrs(vcpu);
> 	}
> 
> -	vmcs_writel(GUEST_CR3, guest_cr3);
> +	if (!skip_cr3)

Nit: It’s a matter of taste, but I prefer positive conditions. i.e. “bool write_guest_cr3”.

Anyway, code seems valid to me. Nice catch.
Reviewed-by: Liran Alon <liran.alon@oracle.com>

-Liran

> +		vmcs_writel(GUEST_CR3, guest_cr3);
> }
> 
> int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> -- 
> 2.22.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3
  2019-09-26 21:43 [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Sean Christopherson
  2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
  2019-09-26 21:43 ` [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
@ 2019-09-27  7:45 ` Reto Buerki
  2019-09-27 12:12 ` Vitaly Kuznetsov
  3 siblings, 0 replies; 13+ messages in thread
From: Reto Buerki @ 2019-09-27  7:45 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 9/26/19 11:43 PM, Sean Christopherson wrote:
> Reto Buerki reported a failure in a nested VMM when running with HLT
> interception disabled in L1.  When putting L2 into HLT, KVM never actually
> enters L2 and instead cancels the nested run and pretends that VM-Enter to
> L2 completed and then exited on HLT (which KVM intercepted).  Because KVM
> never actually runs L2, KVM skips the pending MMU update for L2 and so
> leaves a stale value in vmcs02.GUEST_CR3.  If the next wake event for L2
> triggers a nested VM-Exit, KVM will refresh vmcs12->guest_cr3 from
> vmcs02.GUEST_CR3 and consume the stale value.
> 
> Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
> VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
> update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
> nested code fully responsible for vmcs02.GUEST_CR3.
> 
> I really wanted to go with a different fix of handling this as a one-off
> case in the HLT flow (in nested_vmx_run()), and then following that up
> with a cleanup of VMX's CR3 handling, e.g. to do proper dirty tracking
> instead of having the nested code do manual VMREADs and VMWRITEs.  I even
> went so far as to hide vcpu->arch.cr3 (put CR3 in vcpu->arch.regs), but
> things went south when I started working through the dirty tracking logic.
> 
> Because EPT can be enabled *without* unrestricted guest, enabling EPT
> doesn't always mean GUEST_CR3 really is the guest CR3 (unlike SVM's NPT).
> And because the unrestricted guest handling of GUEST_CR3 is dependent on
> whether the guest has paging enabled, VMX can't even do a clean handoff
> based on unrestricted guest.  In a nutshell, dynamically handling the
> transitions of GUEST_CR3 ownership in VMX is a nightmare, so fixing this
> purely within the context of nested VMX turned out to be the cleanest fix.
> 
> Sean Christopherson (2):
>   KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
>   KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
> 
>  arch/x86/kvm/vmx/nested.c |  8 ++++++++
>  arch/x86/kvm/vmx/vmx.c    | 15 ++++++++++-----
>  2 files changed, 18 insertions(+), 5 deletions(-)

Tested-by: Reto Buerki <reet@codelabs.ch>

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
  2019-09-26 21:43 ` [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
@ 2019-09-27 12:11   ` Vitaly Kuznetsov
  2019-09-27 14:24     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-27 12:11 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, Reto Buerki

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has
> not been read from the VMCS since the last VM-Enter.  If vcpu->arch.cr3
> is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS,
> meaning KVM will do a VMREAD and then VMWRITE the value it just pulled
> from the VMCS.
>
> Note, this is a purely theoretical change, no instances of skipping
> the VMREAD+VMWRITE have been observed with this change.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b530950a9c2b..6de09f60edf3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3003,10 +3003,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  
>  		if (is_guest_mode(vcpu))
>  			skip_cr3 = true;
> -		else if (enable_unrestricted_guest || is_paging(vcpu))
> -			guest_cr3 = kvm_read_cr3(vcpu);
> -		else
> +		else if (!enable_unrestricted_guest && !is_paging(vcpu))
>  			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> +		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))

Nit: with 'test_bit(,(ulong *)&vcpu->arch.regs_avail)' spreading more and
more I'd suggest creating an inline in kvm_cache_regs.h
(e.g. kvm_vcpu_reg_avail()).

> +			guest_cr3 = vcpu->arch.cr3;
> +		else
> +			skip_cr3 = true; /* vmcs01.GUEST_CR3 is up-to-date. */
>  		ept_load_pdptrs(vcpu);
>  	}

-- 
Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3
  2019-09-26 21:43 [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Sean Christopherson
                   ` (2 preceding siblings ...)
  2019-09-27  7:45 ` [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Reto Buerki
@ 2019-09-27 12:12 ` Vitaly Kuznetsov
  3 siblings, 0 replies; 13+ messages in thread
From: Vitaly Kuznetsov @ 2019-09-27 12:12 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Radim Krčmář
  Cc: Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel, Reto Buerki

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> Reto Buerki reported a failure in a nested VMM when running with HLT
> interception disabled in L1.  When putting L2 into HLT, KVM never actually
> enters L2 and instead cancels the nested run and pretends that VM-Enter to
> L2 completed and then exited on HLT (which KVM intercepted).  Because KVM
> never actually runs L2, KVM skips the pending MMU update for L2 and so
> leaves a stale value in vmcs02.GUEST_CR3.  If the next wake event for L2
> triggers a nested VM-Exit, KVM will refresh vmcs12->guest_cr3 from
> vmcs02.GUEST_CR3 and consume the stale value.
>
> Fix the issue by unconditionally writing vmcs02.GUEST_CR3 during nested
> VM-Enter instead of deferring the update to vmx_set_cr3(), and skip the
> update of GUEST_CR3 in vmx_set_cr3() when running L2.  I.e. make the
> nested code fully responsible for vmcs02.GUEST_CR3.
>
> I really wanted to go with a different fix of handling this as a one-off
> case in the HLT flow (in nested_vmx_run()), and then following that up
> with a cleanup of VMX's CR3 handling, e.g. to do proper dirty tracking
> instead of having the nested code do manual VMREADs and VMWRITEs.  I even
> went so far as to hide vcpu->arch.cr3 (put CR3 in vcpu->arch.regs), but
> things went south when I started working through the dirty tracking logic.
>
> Because EPT can be enabled *without* unrestricted guest, enabling EPT
> doesn't always mean GUEST_CR3 really is the guest CR3 (unlike SVM's NPT).
> And because the unrestricted guest handling of GUEST_CR3 is dependent on
> whether the guest has paging enabled, VMX can't even do a clean handoff
> based on unrestricted guest.  In a nutshell, dynamically handling the
> transitions of GUEST_CR3 ownership in VMX is a nightmare, so fixing this
> purely within the context of nested VMX turned out to be the cleanest fix.
>
> Sean Christopherson (2):
>   KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
>   KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
>

Series:
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-26 23:39   ` Jim Mattson
@ 2019-09-27 14:22     ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2019-09-27 14:22 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm list, LKML,
	Reto Buerki

On Thu, Sep 26, 2019 at 04:39:28PM -0700, Jim Mattson wrote:
> On Thu, Sep 26, 2019 at 2:43 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > -       vmcs_writel(GUEST_CR3, guest_cr3);
> > +       if (!skip_cr3)
> > +               vmcs_writel(GUEST_CR3, guest_cr3);
> 
> Is this part of the change necessary, or is it just an optimization to
> save a redundant VMWRITE?

Save the redundant VMWRITE.  I also wanted to convey the idea that the
nested code is responsible for setting GUEST_CR3 to the correct value.
I'd be happy to add a comment to make the latter point explicit.

> >  }
> >
> >  int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > --
> > 2.22.0
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date
  2019-09-27 12:11   ` Vitaly Kuznetsov
@ 2019-09-27 14:24     ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2019-09-27 14:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Radim Krčmář,
	Wanpeng Li, Jim Mattson, Joerg Roedel, kvm, linux-kernel,
	Reto Buerki

On Fri, Sep 27, 2019 at 02:11:27PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > Skip the VMWRITE to update GUEST_CR3 if CR3 is not available, i.e. has
> > not been read from the VMCS since the last VM-Enter.  If vcpu->arch.cr3
> > is stale, kvm_read_cr3(vcpu) will refresh vcpu->arch.cr3 from the VMCS,
> > meaning KVM will do a VMREAD and then VMWRITE the value it just pulled
> > from the VMCS.
> >
> > Note, this is a purely theoretical change, no instances of skipping
> > the VMREAD+VMWRITE have been observed with this change.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index b530950a9c2b..6de09f60edf3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -3003,10 +3003,12 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> >  
> >  		if (is_guest_mode(vcpu))
> >  			skip_cr3 = true;
> > -		else if (enable_unrestricted_guest || is_paging(vcpu))
> > -			guest_cr3 = kvm_read_cr3(vcpu);
> > -		else
> > +		else if (!enable_unrestricted_guest && !is_paging(vcpu))
> >  			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> > +		else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> 
> Nit: with 'test_bit(,(ulong *)&vcpu->arch.regs_avail)' spreading more and
> more I'd suggest creating an inline in kvm_cache_regs.h
> (e.g. kvm_vcpu_reg_avail()).

Part of me wants to keep it painful to discourage one-off checks, but
yeah, a helper would be nice.

> > +			guest_cr3 = vcpu->arch.cr3;
> > +		else
> > +			skip_cr3 = true; /* vmcs01.GUEST_CR3 is up-to-date. */
> >  		ept_load_pdptrs(vcpu);
> >  	}
> 
> -- 
> Vitaly

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-27  0:06   ` Liran Alon
@ 2019-09-27 14:27     ` Sean Christopherson
  2019-09-27 14:44       ` Liran Alon
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2019-09-27 14:27 UTC (permalink / raw)
  To: Liran Alon
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki

On Fri, Sep 27, 2019 at 03:06:02AM +0300, Liran Alon wrote:
> 
> 
> > On 27 Sep 2019, at 0:43, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
> > isntead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
> > is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
> > refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
> > VM-Exit occurs without actually entering L2, e.g. if the nested run
> > is squashed because L2 is being put into HLT.
> 
> I would rephrase to “If an emulated VMEntry is squashed because L1 sets
> vmcs12->guest_activity_state to HLT”.  I think it’s a bit more explicit.
> 
> > 
> > In an ideal world where EPT *requires* unrestricted guest (and vice
> > versa), VMX could handle CR3 similar to how it handles RSP and RIP,
> > e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
> > the unrestricted guest silliness complicates the dirty tracking logic
> > to the point that explicitly handling vmcs02.GUEST_CR3 during nested
> > VM-Enter is a simpler overall implementation.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Reto Buerki <reet@codelabs.ch>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> > arch/x86/kvm/vmx/nested.c | 8 ++++++++
> > arch/x86/kvm/vmx/vmx.c    | 9 ++++++---
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 41abc62c9a8a..971a24134081 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> > 				entry_failure_code))
> > 		return -EINVAL;
> > 
> > +	/*
> > +	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
> > +	 * on nested VM-Exit, which can occur without actually running L2, e.g.
> > +	 * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
> > +	 */
> 
> If I understand correctly, it’s not exactly if L2 is entering HLT state in
> general.  (E.g. issue doesn’t occur if L2 runs HLT directly which is not
> configured to be intercepted by vmcs12).  It’s specifically when L1 enters L2
> with a HLT guest-activity-state. I suggest rephrasing comment.

I deliberately worded the comment so that it remains valid if there are
more conditions in the future that cause KVM to skip running L2.  What if
I split the difference and make the changelog more explicit, but leave the
comment as is?

> > +	if (enable_ept)
> > +		vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
> > +
> > 	/* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
> > 	if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
> > 	    is_pae_paging(vcpu)) {
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index d4575ffb3cec..b530950a9c2b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > {
> > 	struct kvm *kvm = vcpu->kvm;
> > 	unsigned long guest_cr3;
> > +	bool skip_cr3 = false;
> > 	u64 eptp;
> > 
> > 	guest_cr3 = cr3;
> > @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> > 		}
> > 
> > -		if (enable_unrestricted_guest || is_paging(vcpu) ||
> > -		    is_guest_mode(vcpu))
> > +		if (is_guest_mode(vcpu))
> > +			skip_cr3 = true;
> > +		else if (enable_unrestricted_guest || is_paging(vcpu))
> > 			guest_cr3 = kvm_read_cr3(vcpu);
> > 		else
> > 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> > 		ept_load_pdptrs(vcpu);
> > 	}
> > 
> > -	vmcs_writel(GUEST_CR3, guest_cr3);
> > +	if (!skip_cr3)
> 
> Nit: It’s a matter of taste, but I prefer positive conditions. i.e. “bool
> write_guest_cr3”.
> 
> Anyway, code seems valid to me. Nice catch.
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> 
> -Liran
> 
> > +		vmcs_writel(GUEST_CR3, guest_cr3);
> > }
> > 
> > int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > -- 
> > 2.22.0
> > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-27 14:27     ` Sean Christopherson
@ 2019-09-27 14:44       ` Liran Alon
  2019-09-27 15:02         ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Liran Alon @ 2019-09-27 14:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki



> On 27 Sep 2019, at 17:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Fri, Sep 27, 2019 at 03:06:02AM +0300, Liran Alon wrote:
>> 
>> 
>>> On 27 Sep 2019, at 0:43, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> Write the desired L2 CR3 into vmcs02.GUEST_CR3 during nested VM-Enter
>>> isntead of deferring the VMWRITE until vmx_set_cr3().  If the VMWRITE
>>> is deferred, then KVM can consume a stale vmcs02.GUEST_CR3 when it
>>> refreshes vmcs12->guest_cr3 during nested_vmx_vmexit() if the emulated
>>> VM-Exit occurs without actually entering L2, e.g. if the nested run
>>> is squashed because L2 is being put into HLT.
>> 
>> I would rephrase to “If an emulated VMEntry is squashed because L1 sets
>> vmcs12->guest_activity_state to HLT”.  I think it’s a bit more explicit.
>> 
>>> 
>>> In an ideal world where EPT *requires* unrestricted guest (and vice
>>> versa), VMX could handle CR3 similar to how it handles RSP and RIP,
>>> e.g. mark CR3 dirty and conditionally load it at vmx_vcpu_run().  But
>>> the unrestricted guest silliness complicates the dirty tracking logic
>>> to the point that explicitly handling vmcs02.GUEST_CR3 during nested
>>> VM-Enter is a simpler overall implementation.
>>> 
>>> Cc: stable@vger.kernel.org
>>> Reported-by: Reto Buerki <reet@codelabs.ch>
>>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>>> ---
>>> arch/x86/kvm/vmx/nested.c | 8 ++++++++
>>> arch/x86/kvm/vmx/vmx.c    | 9 ++++++---
>>> 2 files changed, 14 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>>> index 41abc62c9a8a..971a24134081 100644
>>> --- a/arch/x86/kvm/vmx/nested.c
>>> +++ b/arch/x86/kvm/vmx/nested.c
>>> @@ -2418,6 +2418,14 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>> 				entry_failure_code))
>>> 		return -EINVAL;
>>> 
>>> +	/*
>>> +	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
>>> +	 * on nested VM-Exit, which can occur without actually running L2, e.g.
>>> +	 * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
>>> +	 */
>> 
>> If I understand correctly, it’s not exactly if L2 is entering HLT state in
>> general.  (E.g. issue doesn’t occur if L2 runs HLT directly which is not
>> configured to be intercepted by vmcs12).  It’s specifically when L1 enters L2
>> with a HLT guest-activity-state. I suggest rephrasing comment.
> 
> I deliberately worded the comment so that it remains valid if there are
> more conditions in the future that cause KVM to skip running L2.  What if
> I split the difference and make the changelog more explicit, but leave the
> comment as is?

I think what is confusing in comment is that it seems to also refer to the case
where L2 directly enters HLT state without L1 intercept. Which isn’t related.
So I would explicitly mention it’s when L1 enters L2 but don’t physically enter guest
with vmcs02 because L2 is in HLT state.

-Liran

> 
>>> +	if (enable_ept)
>>> +		vmcs_writel(GUEST_CR3, vmcs12->guest_cr3);
>>> +
>>> 	/* Late preparation of GUEST_PDPTRs now that EFER and CRs are set. */
>>> 	if (load_guest_pdptrs_vmcs12 && nested_cpu_has_ept(vmcs12) &&
>>> 	    is_pae_paging(vcpu)) {
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index d4575ffb3cec..b530950a9c2b 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -2985,6 +2985,7 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>> {
>>> 	struct kvm *kvm = vcpu->kvm;
>>> 	unsigned long guest_cr3;
>>> +	bool skip_cr3 = false;
>>> 	u64 eptp;
>>> 
>>> 	guest_cr3 = cr3;
>>> @@ -3000,15 +3001,17 @@ void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>>> 			spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>>> 		}
>>> 
>>> -		if (enable_unrestricted_guest || is_paging(vcpu) ||
>>> -		    is_guest_mode(vcpu))
>>> +		if (is_guest_mode(vcpu))
>>> +			skip_cr3 = true;
>>> +		else if (enable_unrestricted_guest || is_paging(vcpu))
>>> 			guest_cr3 = kvm_read_cr3(vcpu);
>>> 		else
>>> 			guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
>>> 		ept_load_pdptrs(vcpu);
>>> 	}
>>> 
>>> -	vmcs_writel(GUEST_CR3, guest_cr3);
>>> +	if (!skip_cr3)
>> 
>> Nit: It’s a matter of taste, but I prefer positive conditions. i.e. “bool
>> write_guest_cr3”.
>> 
>> Anyway, code seems valid to me. Nice catch.
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> 
>> -Liran
>> 
>>> +		vmcs_writel(GUEST_CR3, guest_cr3);
>>> }
>>> 
>>> int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>>> -- 
>>> 2.22.0
>>> 
>> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter
  2019-09-27 14:44       ` Liran Alon
@ 2019-09-27 15:02         ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2019-09-27 15:02 UTC (permalink / raw)
  To: Liran Alon
  Cc: Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Reto Buerki

On Fri, Sep 27, 2019 at 05:44:53PM +0300, Liran Alon wrote:
> 
> > On 27 Sep 2019, at 17:27, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > On Fri, Sep 27, 2019 at 03:06:02AM +0300, Liran Alon wrote:
> >> 
> >>> On 27 Sep 2019, at 0:43, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> >>> 
> >>> +	/*
> >>> +	 * Immediately write vmcs02.GUEST_CR3.  It will be propagated to vmcs12
> >>> +	 * on nested VM-Exit, which can occur without actually running L2, e.g.
> >>> +	 * if L2 is entering HLT state, and thus without hitting vmx_set_cr3().
> >>> +	 */
> >> 
> >> If I understand correctly, it’s not exactly if L2 is entering HLT state in
> >> general.  (E.g. issue doesn’t occur if L2 runs HLT directly which is not
> >> configured to be intercepted by vmcs12).  It’s specifically when L1 enters L2
> >> with a HLT guest-activity-state. I suggest rephrasing comment.
> > 
> > I deliberately worded the comment so that it remains valid if there are
> > more conditions in the future that cause KVM to skip running L2.  What if
> > I split the difference and make the changelog more explicit, but leave the
> > comment as is?
> 
> I think what is confusing in comment is that it seems to also refer to the case
> where L2 directly enters HLT state without L1 intercept. Which isn’t related.
> So I would explicitly mention it’s when L1 enters L2 but don’t physically enter guest
> with vmcs02 because L2 is in HLT state.

Ah, gotcha, I'll tweak the wording.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-09-27 15:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 21:43 [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Sean Christopherson
2019-09-26 21:43 ` [PATCH 1/2] KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter Sean Christopherson
2019-09-26 23:39   ` Jim Mattson
2019-09-27 14:22     ` Sean Christopherson
2019-09-27  0:06   ` Liran Alon
2019-09-27 14:27     ` Sean Christopherson
2019-09-27 14:44       ` Liran Alon
2019-09-27 15:02         ` Sean Christopherson
2019-09-26 21:43 ` [PATCH 2/2] KVM: VMX: Skip GUEST_CR3 VMREAD+VMWRITE if the VMCS is up-to-date Sean Christopherson
2019-09-27 12:11   ` Vitaly Kuznetsov
2019-09-27 14:24     ` Sean Christopherson
2019-09-27  7:45 ` [PATCH 0/2] KVM: nVMX: Bug fix for consuming stale vmcs02.GUEST_CR3 Reto Buerki
2019-09-27 12:12 ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).