* [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
@ 2020-08-20 23:05 Peter Shier
2020-08-22 3:25 ` Sean Christopherson
2020-09-11 15:51 ` Paolo Bonzini
0 siblings, 2 replies; 4+ messages in thread
From: Peter Shier @ 2020-08-20 23:05 UTC (permalink / raw)
To: kvm; +Cc: pbonzini, Peter Shier, Jim Mattson
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
load_pdptrs to read the possibly updated PDPTEs from the guest
physical address referenced by CR3. It loads them into
vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
vcpu->arch.regs_dirty.
At the subsequent assumed reentry into L2, the mmu will call
vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
if the L2 CRn write intercept always resumes L2.
The resume path calls vmx_check_nested_events which checks for
exceptions, MTF, and expired VMX preemption timers. If
vmx_check_nested_events finds any of these conditions pending it will
reflect the corresponding exit into L1. Live migration at this point
would also cause a missed immediate reentry into L2.
After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
values stored at last L2 exit. A repro of this bug showed L2 entering
triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
When L2 is in PAE paging mode add a call to ept_load_pdptrs before
leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
vcpu->arch.walk_mmu->pdptrs[].
Tested:
kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
Verified that test fails without the fix.
Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
failures.
Signed-off-by: Peter Shier <pshier@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
---
v1 -> v2:
* Per Sean's suggestion removed the new x86 op and calling ept_load_pdptrs from
nested_vmx_vmexit
arch/x86/kvm/vmx/nested.c | 7 +++++++
arch/x86/kvm/vmx/vmx.c | 4 ++--
arch/x86/kvm/vmx/vmx.h | 1 +
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..4d46025213e9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4404,6 +4404,13 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
kvm_vcpu_flush_tlb_current(vcpu);
+ /*
+ * Ensure that the VMCS02 PDPTR fields are up-to-date before switching
+ * to L1.
+ */
+ if (enable_ept && is_pae_paging(vcpu))
+ vmx_ept_load_pdptrs(vcpu);
+
leave_guest_mode(vcpu);
if (nested_cpu_has_preemption_timer(vmcs12))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..19a599bebd5c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2971,7 +2971,7 @@ static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
vpid_sync_context(to_vmx(vcpu)->vpid);
}
-static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
+void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu)
{
struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
@@ -3114,7 +3114,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
guest_cr3 = vcpu->arch.cr3;
else /* vmcs01.GUEST_CR3 is already up-to-date. */
update_guest_cr3 = false;
- ept_load_pdptrs(vcpu);
+ vmx_ept_load_pdptrs(vcpu);
} else {
guest_cr3 = pgd;
}
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 26175a4759fa..a2f82127c170 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -356,7 +356,7 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
+void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
#define POSTED_INTR_ON 0
#define POSTED_INTR_SN 1
--
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
2020-08-20 23:05 [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected Peter Shier
@ 2020-08-22 3:25 ` Sean Christopherson
2020-09-01 20:29 ` Peter Shier
2020-09-11 15:51 ` Paolo Bonzini
1 sibling, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2020-08-22 3:25 UTC (permalink / raw)
To: Peter Shier; +Cc: kvm, pbonzini, Jim Mattson
On Thu, Aug 20, 2020 at 04:05:45PM -0700, Peter Shier wrote:
> When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
> load_pdptrs to read the possibly updated PDPTEs from the guest
> physical address referenced by CR3. It loads them into
> vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
> vcpu->arch.regs_dirty.
>
> At the subsequent assumed reentry into L2, the mmu will call
> vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
> VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
> VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
> if the L2 CRn write intercept always resumes L2.
>
> The resume path calls vmx_check_nested_events which checks for
> exceptions, MTF, and expired VMX preemption timers. If
> vmx_check_nested_events finds any of these conditions pending it will
> reflect the corresponding exit into L1. Live migration at this point
> would also cause a missed immediate reentry into L2.
>
> After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
> clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
> resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
> vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
> vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
> VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
> values stored at last L2 exit. A repro of this bug showed L2 entering
> triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
>
> When L2 is in PAE paging mode add a call to ept_load_pdptrs before
> leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
> vcpu->arch.walk_mmu->pdptrs[].
>
> Tested:
> kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
> Verified that test fails without the fix.
>
> Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
> custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
> would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
> failures.
>
> Signed-off-by: Peter Shier <pshier@google.com>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> ---
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
2020-08-22 3:25 ` Sean Christopherson
@ 2020-09-01 20:29 ` Peter Shier
0 siblings, 0 replies; 4+ messages in thread
From: Peter Shier @ 2020-09-01 20:29 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, Paolo Bonzini, Jim Mattson
On Fri, Aug 21, 2020 at 8:25 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Aug 20, 2020 at 04:05:45PM -0700, Peter Shier wrote:
> > When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
> > load_pdptrs to read the possibly updated PDPTEs from the guest
> > physical address referenced by CR3. It loads them into
> > vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
> > vcpu->arch.regs_dirty.
> >
> > At the subsequent assumed reentry into L2, the mmu will call
> > vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
> > VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
> > VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
> > if the L2 CRn write intercept always resumes L2.
> >
> > The resume path calls vmx_check_nested_events which checks for
> > exceptions, MTF, and expired VMX preemption timers. If
> > vmx_check_nested_events finds any of these conditions pending it will
> > reflect the corresponding exit into L1. Live migration at this point
> > would also cause a missed immediate reentry into L2.
> >
> > After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
> > clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
> > resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
> > vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
> > vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
> > VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
> > values stored at last L2 exit. A repro of this bug showed L2 entering
> > triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
> >
> > When L2 is in PAE paging mode add a call to ept_load_pdptrs before
> > leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
> > vcpu->arch.walk_mmu->pdptrs[].
> >
> > Tested:
> > kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
> > Verified that test fails without the fix.
> >
> > Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
> > custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
> > would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
> > failures.
> >
> > Signed-off-by: Peter Shier <pshier@google.com>
> > Reviewed-by: Jim Mattson <jmattson@google.com>
> > ---
>
> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Ping. Thx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
2020-08-20 23:05 [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected Peter Shier
2020-08-22 3:25 ` Sean Christopherson
@ 2020-09-11 15:51 ` Paolo Bonzini
1 sibling, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2020-09-11 15:51 UTC (permalink / raw)
To: Peter Shier, kvm; +Cc: Jim Mattson
On 21/08/20 01:05, Peter Shier wrote:
> After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
> clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
> resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
> vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
> vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
> VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
> values stored at last L2 exit. A repro of this bug showed L2 entering
> triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
>
> When L2 is in PAE paging mode add a call to ept_load_pdptrs before
> leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
> vcpu->arch.walk_mmu->pdptrs[].
Queued with an improved comment:
/*
- * Ensure that the VMCS02 PDPTR fields are up-to-date before switching
- * to L1.
+ * VCPU_EXREG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between
+ * now and the new vmentry. Ensure that the VMCS02 PDPTR fields are
+ * up-to-date before switching to L1.
*/
I am currently on leave so I am going through the patches and queuing
them, but I will only push kvm/next and kvm/queue next week. kvm/master
patches will be sent to Linus for the next -rc though.
Paolo
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-09-11 15:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 23:05 [PATCH v2] KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected Peter Shier
2020-08-22 3:25 ` Sean Christopherson
2020-09-01 20:29 ` Peter Shier
2020-09-11 15:51 ` Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.