[PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests
@ 2021-11-10 10:00 Maxim Levitsky
  2021-11-10 10:00 ` [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER Maxim Levitsky
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 10:00 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Paolo Bonzini, Jim Mattson,
	Maxim Levitsky

This is hopefully the last issue I was tracking in regard to nested migration,
as far as I know.

The issue is that migration of L1 which is normal 64 bit guest,
but is running a 32 bit nested guest is broken on VMX and I finally found out why.

There are two bugs, both related to the fact that qemu first restores SREGS
of L2, and only then sets the nested state. That haunts us till this day.

First issue is that vmx_set_nested_state does some checks on the host
state stored in vmcs12, but it uses the current IA32_EFER which is from L2.
Thus, consistency checks fail.

I fixed this by restoring L1's efer from vmcs12, letting these checks pass,
which is somewhat hacky so I am open for better suggestions on how to do this.
One option is to pass explicit value of the L1's IA32_EFER to the consistency
check code, and leave L2's IA32_EFER alone.

The second issue is that L2 IA32_EFER makes L1's mmu be initialized incorrectly
(with PAE paging). This itself isn't an immediate problem as we are going into the L2,
but when we exit it, we don't reset the L1's mmu back to 64 bit mode because,
It so happens that the mmu role doesn't change and the 64 bitness isn't part of the mmu role.

I fixed this also with somewhat a hack by checking that mmu's level didn't change,
but there is also an option to make 64 bitness be part of the mmu role.

Also when restoring the L1's IA32_EFER, it is possible to reset L1's mmu,
so that it is setup correctly, which isn't strictly needed but does
make it more bug proof.
The 3rd patch is still needed as resetting the mmu right after restoring
IA32_EFER does nothing without this patch as well.

SVM in theory has both issues, but restoring L1's EFER into vcpu->arch.efer
isn't needed there as the code explicitly checks the L1's save area instead
for consistency.

Best regards,
	Maxim Levitsky

Maxim Levitsky (3):
  KVM: nVMX: extract calculation of the L1's EFER
  KVM: nVMX: restore L1's EFER prior to setting the nested state
  KVM: x86/mmu: don't skip mmu initialization when mmu root level
    changes

 arch/x86/kvm/mmu/mmu.c    | 14 ++++++++++----
 arch/x86/kvm/vmx/nested.c | 33 +++++++++++++++++++++++++++------
 2 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER
  2021-11-10 10:00 [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests Maxim Levitsky
@ 2021-11-10 10:00 ` Maxim Levitsky
  2021-11-10 14:43   ` Vitaly Kuznetsov
  2021-11-10 10:00 ` [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state Maxim Levitsky
  2021-11-10 10:00 ` [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes Maxim Levitsky
  2 siblings, 1 reply; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 10:00 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Paolo Bonzini, Jim Mattson,
	Maxim Levitsky

This will be useful in the next patch.

No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b4ee5e9f9e201..49ae96c0cc4d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4228,6 +4228,21 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 	kvm_clear_interrupt_queue(vcpu);
 }
 
+/*
+ * Given vmcs12, return the expected L1 value of IA32_EFER
+ * after VM exit from that vmcs12
+ */
+static inline u64 nested_vmx_get_vmcs12_host_efer(struct kvm_vcpu *vcpu,
+						  struct vmcs12 *vmcs12)
+{
+	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
+		return vmcs12->host_ia32_efer;
+	else if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
+		return vcpu->arch.efer | (EFER_LMA | EFER_LME);
+	else
+		return vcpu->arch.efer & ~(EFER_LMA | EFER_LME);
+}
+
 /*
  * A part of what we need to when the nested L2 guest exits and we want to
  * run its L1 parent, is to reset L1's guest state to the host state specified
@@ -4243,12 +4258,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	enum vm_entry_failure_code ignored;
 	struct kvm_segment seg;
 
-	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
-		vcpu->arch.efer = vmcs12->host_ia32_efer;
-	else if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
-		vcpu->arch.efer |= (EFER_LMA | EFER_LME);
-	else
-		vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
+	vcpu->arch.efer = nested_vmx_get_vmcs12_host_efer(vcpu, vmcs12);
 	vmx_set_efer(vcpu, vcpu->arch.efer);
 
 	kvm_rsp_write(vcpu, vmcs12->host_rsp);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state
  2021-11-10 10:00 [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests Maxim Levitsky
  2021-11-10 10:00 ` [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER Maxim Levitsky
@ 2021-11-10 10:00 ` Maxim Levitsky
  2021-11-10 15:01   ` Paolo Bonzini
  2021-11-10 10:00 ` [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes Maxim Levitsky
  2 siblings, 1 reply; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 10:00 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Paolo Bonzini, Jim Mattson,
	Maxim Levitsky

It is known that some kvm's users (e.g. qemu) load part of L2's register
state prior to setting the nested state after a migration.

If a 32 bit L2 guest is running in a 64 bit L1 guest, and nested migration
happens, Qemu will restore L2's EFER, and then the nested state load
function will use it as if it was L1's EFER.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 49ae96c0cc4d1..28e270824e5b1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6404,6 +6404,17 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 			kvm_state->hdr.vmx.preemption_timer_deadline;
 	}
 
+	/*
+	 * The vcpu might currently contain L2's IA32_EFER, due to the way
+	 * some userspace kvm users (e.g qemu) restore nested state.
+	 *
+	 * To fix this, restore its IA32_EFER to the value it would have
+	 * after VM exit from the nested guest.
+	 *
+	 */
+
+	vcpu->arch.efer = nested_vmx_get_vmcs12_host_efer(vcpu, vmcs12);
+
 	if (nested_vmx_check_controls(vcpu, vmcs12) ||
 	    nested_vmx_check_host_state(vcpu, vmcs12) ||
 	    nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes
  2021-11-10 10:00 [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests Maxim Levitsky
  2021-11-10 10:00 ` [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER Maxim Levitsky
  2021-11-10 10:00 ` [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state Maxim Levitsky
@ 2021-11-10 10:00 ` Maxim Levitsky
  2021-11-10 14:48   ` Vitaly Kuznetsov
  2 siblings, 1 reply; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 10:00 UTC (permalink / raw)
  To: kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Paolo Bonzini, Jim Mattson,
	Maxim Levitsky

When running mix of 32 and 64 bit guests, it is possible to have mmu
reset with same mmu role but different root level (32 bit vs 64 bit paging)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 354d2ca92df4d..763867475860f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4745,7 +4745,10 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	union kvm_mmu_role new_role =
 		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
 
-	if (new_role.as_u64 == context->mmu_role.as_u64)
+	u8 new_root_level = role_regs_to_root_level(&regs);
+
+	if (new_role.as_u64 == context->mmu_role.as_u64 &&
+	    context->root_level == new_root_level)
 		return;
 
 	context->mmu_role.as_u64 = new_role.as_u64;
@@ -4757,7 +4760,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->get_guest_pgd = get_cr3;
 	context->get_pdptr = kvm_pdptr_read;
 	context->inject_page_fault = kvm_inject_page_fault;
-	context->root_level = role_regs_to_root_level(&regs);
+	context->root_level = new_root_level;
 
 	if (!is_cr0_pg(context))
 		context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -4806,7 +4809,10 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 				    struct kvm_mmu_role_regs *regs,
 				    union kvm_mmu_role new_role)
 {
-	if (new_role.as_u64 == context->mmu_role.as_u64)
+	u8 new_root_level = role_regs_to_root_level(regs);
+
+	if (new_role.as_u64 == context->mmu_role.as_u64 &&
+	    context->root_level == new_root_level)
 		return;
 
 	context->mmu_role.as_u64 = new_role.as_u64;
@@ -4817,8 +4823,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
 		paging64_init_context(context);
 	else
 		paging32_init_context(context);
-	context->root_level = role_regs_to_root_level(regs);
 
+	context->root_level = new_root_level;
 	reset_guest_paging_metadata(vcpu, context);
 	context->shadow_root_level = new_role.base.level;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER
  2021-11-10 10:00 ` [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER Maxim Levitsky
@ 2021-11-10 14:43   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-11-10 14:43 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Paolo Bonzini, Jim Mattson, Maxim Levitsky

Maxim Levitsky <mlevitsk@redhat.com> writes:

> This will be useful in the next patch.

Nitpick: "the next patch" may not be what you expect after merge/when
backporting/... so it's better to call things out explicityly, something
like:

"The newly introduced nested_vmx_get_vmcs12_host_efer() helper will be
used when nested state is restored in vmx_set_nested_state()".

>
> No functional change intended.
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index b4ee5e9f9e201..49ae96c0cc4d1 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4228,6 +4228,21 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	kvm_clear_interrupt_queue(vcpu);
>  }
>  
> +/*
> + * Given vmcs12, return the expected L1 value of IA32_EFER
> + * after VM exit from that vmcs12
> + */
> +static inline u64 nested_vmx_get_vmcs12_host_efer(struct kvm_vcpu *vcpu,
> +						  struct vmcs12 *vmcs12)
> +{
> +	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
> +		return vmcs12->host_ia32_efer;
> +	else if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
> +		return vcpu->arch.efer | (EFER_LMA | EFER_LME);
> +	else
> +		return vcpu->arch.efer & ~(EFER_LMA | EFER_LME);
> +}
> +
>  /*
>   * A part of what we need to when the nested L2 guest exits and we want to
>   * run its L1 parent, is to reset L1's guest state to the host state specified
> @@ -4243,12 +4258,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>  	enum vm_entry_failure_code ignored;
>  	struct kvm_segment seg;
>  
> -	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
> -		vcpu->arch.efer = vmcs12->host_ia32_efer;
> -	else if (vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE)
> -		vcpu->arch.efer |= (EFER_LMA | EFER_LME);
> -	else
> -		vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
> +	vcpu->arch.efer = nested_vmx_get_vmcs12_host_efer(vcpu, vmcs12);
>  	vmx_set_efer(vcpu, vcpu->arch.efer);
>  
>  	kvm_rsp_write(vcpu, vmcs12->host_rsp);

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes
  2021-11-10 10:00 ` [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes Maxim Levitsky
@ 2021-11-10 14:48   ` Vitaly Kuznetsov
  2021-11-10 15:00     ` Maxim Levitsky
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-11-10 14:48 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Paolo Bonzini, Jim Mattson, Maxim Levitsky

Maxim Levitsky <mlevitsk@redhat.com> writes:

> When running mix of 32 and 64 bit guests, it is possible to have mmu
> reset with same mmu role but different root level (32 bit vs 64 bit paging)
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 354d2ca92df4d..763867475860f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4745,7 +4745,10 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
>  	union kvm_mmu_role new_role =
>  		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
>  
> -	if (new_role.as_u64 == context->mmu_role.as_u64)
> +	u8 new_root_level = role_regs_to_root_level(&regs);
> +
> +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> +	    context->root_level == new_root_level)
>  		return;

role_regs_to_root_level() uses 3 things: CR0.PG, EFER.LMA and CR4.PAE
and two of these three are already encoded into extended mmu role
(kvm_calc_mmu_role_ext()). Could we achieve the same result by adding
EFER.LMA there?

>  
>  	context->mmu_role.as_u64 = new_role.as_u64;
> @@ -4757,7 +4760,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
>  	context->get_guest_pgd = get_cr3;
>  	context->get_pdptr = kvm_pdptr_read;
>  	context->inject_page_fault = kvm_inject_page_fault;
> -	context->root_level = role_regs_to_root_level(&regs);
> +	context->root_level = new_root_level;
>  
>  	if (!is_cr0_pg(context))
>  		context->gva_to_gpa = nonpaging_gva_to_gpa;
> @@ -4806,7 +4809,10 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
>  				    struct kvm_mmu_role_regs *regs,
>  				    union kvm_mmu_role new_role)
>  {
> -	if (new_role.as_u64 == context->mmu_role.as_u64)
> +	u8 new_root_level = role_regs_to_root_level(regs);
> +
> +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> +	    context->root_level == new_root_level)
>  		return;
>  
>  	context->mmu_role.as_u64 = new_role.as_u64;
> @@ -4817,8 +4823,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
>  		paging64_init_context(context);
>  	else
>  		paging32_init_context(context);
> -	context->root_level = role_regs_to_root_level(regs);
>  
> +	context->root_level = new_root_level;
>  	reset_guest_paging_metadata(vcpu, context);
>  	context->shadow_root_level = new_role.base.level;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes
  2021-11-10 14:48   ` Vitaly Kuznetsov
@ 2021-11-10 15:00     ` Maxim Levitsky
  2021-11-10 17:21       ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 15:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Paolo Bonzini, Jim Mattson

On Wed, 2021-11-10 at 15:48 +0100, Vitaly Kuznetsov wrote:
> Maxim Levitsky <mlevitsk@redhat.com> writes:
> 
> > When running mix of 32 and 64 bit guests, it is possible to have mmu
> > reset with same mmu role but different root level (32 bit vs 64 bit paging)
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 354d2ca92df4d..763867475860f 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4745,7 +4745,10 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> >  	union kvm_mmu_role new_role =
> >  		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
> >  
> > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > +	u8 new_root_level = role_regs_to_root_level(&regs);
> > +
> > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > +	    context->root_level == new_root_level)
> >  		return;
> 
> role_regs_to_root_level() uses 3 things: CR0.PG, EFER.LMA and CR4.PAE
> and two of these three are already encoded into extended mmu role
> (kvm_calc_mmu_role_ext()). Could we achieve the same result by adding
> EFER.LMA there?

Absolutely. I just wanted your feedback on this to see if there is any reason to not
do this.

Also it seems that only basic role is compared here.

I don't 100% know the reason why we have basic and extended roles - there is a
comment about basic/extended mmu role to minimize the size of arch.gfn_track,
but I haven't yet studied in depth why.

Best regards,
	Maxim Levitsky

> 
> >  
> >  	context->mmu_role.as_u64 = new_role.as_u64;
> > @@ -4757,7 +4760,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> >  	context->get_guest_pgd = get_cr3;
> >  	context->get_pdptr = kvm_pdptr_read;
> >  	context->inject_page_fault = kvm_inject_page_fault;
> > -	context->root_level = role_regs_to_root_level(&regs);
> > +	context->root_level = new_root_level;
> >  
> >  	if (!is_cr0_pg(context))
> >  		context->gva_to_gpa = nonpaging_gva_to_gpa;
> > @@ -4806,7 +4809,10 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> >  				    struct kvm_mmu_role_regs *regs,
> >  				    union kvm_mmu_role new_role)
> >  {
> > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > +	u8 new_root_level = role_regs_to_root_level(regs);
> > +
> > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > +	    context->root_level == new_root_level)
> >  		return;
> >  
> >  	context->mmu_role.as_u64 = new_role.as_u64;
> > @@ -4817,8 +4823,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> >  		paging64_init_context(context);
> >  	else
> >  		paging32_init_context(context);
> > -	context->root_level = role_regs_to_root_level(regs);
> >  
> > +	context->root_level = new_root_level;
> >  	reset_guest_paging_metadata(vcpu, context);
> >  	context->shadow_root_level = new_role.base.level;



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state
  2021-11-10 10:00 ` [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state Maxim Levitsky
@ 2021-11-10 15:01   ` Paolo Bonzini
  2021-11-10 15:08     ` Maxim Levitsky
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-11-10 15:01 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Jim Mattson

On 11/10/21 11:00, Maxim Levitsky wrote:
> +	/*
> +	 * The vcpu might currently contain L2's IA32_EFER, due to the way
> +	 * some userspace kvm users (e.g qemu) restore nested state.
> +	 *
> +	 * To fix this, restore its IA32_EFER to the value it would have
> +	 * after VM exit from the nested guest.
> +	 *
> +	 */
> +
> +	vcpu->arch.efer = nested_vmx_get_vmcs12_host_efer(vcpu, vmcs12);
> +

In principle the value of LOAD_HOST_EFER on exit need not be the same as 
on entry.  But you don't need all of EFER, only EFER.LME/EFER.LMA, and 
those two bits must match ("the values of the LMA and LME bits in the 
field must each be that of the “host address-space size” VM-exit 
control" from the "Checks on Host Control Registers, MSRs, and SSP"; 
plus the "Checks Related to Address-Space Size").

At least it's worth adjusting the comment to explain that.  But the root 
cause of the issue is just nested_vmx_check_* accessing vcpu->arch.  So 
you can instead:

- split out of nested_vmx_check_host_state a new function 
nested_vmx_check_address_state_size that does

#ifdef CONFIG_X86_64
	if (CC(!!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE) !=
	       !!(vcpu->arch.efer & EFER_LMA)))
		return -EINVAL;
#endif
	return 0;

- call it from vmentry but not from migration

- in nested_vmx_check_host_state, assign ia32e from 
vmcs12->vm_exit_controls instead of vcpu->arch.efer

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state
  2021-11-10 15:01   ` Paolo Bonzini
@ 2021-11-10 15:08     ` Maxim Levitsky
  0 siblings, 0 replies; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-10 15:08 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: Wanpeng Li, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	linux-kernel, Sean Christopherson, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Vitaly Kuznetsov, Thomas Gleixner, Jim Mattson

On Wed, 2021-11-10 at 16:01 +0100, Paolo Bonzini wrote:
> On 11/10/21 11:00, Maxim Levitsky wrote:
> > +	/*
> > +	 * The vcpu might currently contain L2's IA32_EFER, due to the way
> > +	 * some userspace kvm users (e.g qemu) restore nested state.
> > +	 *
> > +	 * To fix this, restore its IA32_EFER to the value it would have
> > +	 * after VM exit from the nested guest.
> > +	 *
> > +	 */
> > +
> > +	vcpu->arch.efer = nested_vmx_get_vmcs12_host_efer(vcpu, vmcs12);
> > +
> 
> In principle the value of LOAD_HOST_EFER on exit need not be the same as 
> on entry.  But you don't need all of EFER, only EFER.LME/EFER.LMA, and 
> those two bits must match ("the values of the LMA and LME bits in the 
> field must each be that of the “host address-space size” VM-exit 
> control" from the "Checks on Host Control Registers, MSRs, and SSP"; 
> plus the "Checks Related to Address-Space Size").
> 
> At least it's worth adjusting the comment to explain that.  But the root 
> cause of the issue is just nested_vmx_check_* accessing vcpu->arch.  So 
> you can instead:
> 
> - split out of nested_vmx_check_host_state a new function 
> nested_vmx_check_address_state_size that does
> 
> #ifdef CONFIG_X86_64
> 	if (CC(!!(vmcs12->vm_exit_controls & VM_EXIT_HOST_ADDR_SPACE_SIZE) !=
> 	       !!(vcpu->arch.efer & EFER_LMA)))
> 		return -EINVAL;
> #endif
> 	return 0;
> 
> - call it from vmentry but not from migration
> 
> - in nested_vmx_check_host_state, assign ia32e from 
> vmcs12->vm_exit_controls instead of vcpu->arch.efer

I agree with you. I was thinking do something like that but wasn't sure at all.

Best regards,
	Maxim Levitsky

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes
  2021-11-10 15:00     ` Maxim Levitsky
@ 2021-11-10 17:21       ` Sean Christopherson
  2021-11-15 12:14         ` Maxim Levitsky
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-11-10 17:21 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Vitaly Kuznetsov, kvm, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	H. Peter Anvin, linux-kernel, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Paolo Bonzini, Jim Mattson

On Wed, Nov 10, 2021, Maxim Levitsky wrote:
> On Wed, 2021-11-10 at 15:48 +0100, Vitaly Kuznetsov wrote:
> > Maxim Levitsky <mlevitsk@redhat.com> writes:
> > 
> > > When running mix of 32 and 64 bit guests, it is possible to have mmu
> > > reset with same mmu role but different root level (32 bit vs 64 bit paging)

Hmm, no, it's far more nuanced than just "running a mix of 32 and 64 bit guests".
The changelog needs a much more in-depth explanation of exactly what and how
things go awry.  I suspect that whatever bug is being hit is unique to the
migration path.

This needs a Fixes and Cc: stable@vger.kernel.org, but which commit is fixed is
TBD.

Simply running 32 and 64 bit guests is not sufficient to cause problems.  In
that case, they are different VMs entirely, where as the MMU context is per-vCPU.
So at minimum, this require running a mix of 32-bit and 64-bit _nested_ guests.

But even that isn't sufficient.  Ignoring migration for the moment:

  a) If shadow paging is enabled in L0, then EFER.LMA is captured in
     kvm_mmu_page_role.level.  Ergo this flaw doesn't affect legacy shadow paging.

  b) If EPT is enabled in L0 _and_ L1, kvm_mmu_page_role.level tracks L1's EPT
     level.  Ergo, this flaw doesn't affect nested EPT.

  c) If NPT is enabled in L0 _and_ L1, then this flaw can does apply as
     kvm_mmu_page_role.level tracks L0's NPT level, which does not incorporate
     L1's NPT level because of the limitations of NPT.  But the cover letter
     states these bugs are specific to VMX.  I suspect that's incorrect and that
     in theory this particular bug does apply to SVM, but that it hasn't been
     hit due to not running with nNPT and both 32-bit and 64-bit L2s on a single
     vCPU.

  d) If TDP is enabled in L0 but _not_ L1, then L1 and L2 share an MMU context,
     and that context is guaranteed to be reset on every nested transition due
     to kvm_mmu_page_role.guest_mode.  Ergo this flaw doesn't affect this combo
     since it's impossible to switch between two L2 vCPUs on a single L1 vCPU
     without a context reset.

The cover letter says:

  The second issue is that L2 IA32_EFER makes L1's mmu be initialized incorrectly
  (with PAE paging). This itself isn't an immediate problem as we are going into the L2,
  but when we exit it, we don't reset the L1's mmu back to 64 bit mode because,
  It so happens that the mmu role doesn't change and the 64 bitness isn't part of the mmu role.

But that should be impossible because of kvm_mmu_page_role.guest_mode.  If that
doesn't trigger a reset, then presumably is_guest_mode() is stale?

Regarding (c), I believe this can be hit by running a 32-bit L2 and 64-bit L2 on
SVM with nNPT.  I don't think that's a combination I've tested much, if at all.

Regarding (d), I believe the bug can rear its head if guest state is stuffed
via KVM_SET_SREGS{2}.  kvm_mmu_reset_context() will re-init the MMU, but it
doesn't purge the previous context.  I.e. the assumption that a switch between
"two" L2s can be broken.

> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
> > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 354d2ca92df4d..763867475860f 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -4745,7 +4745,10 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> > >  	union kvm_mmu_role new_role =
> > >  		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
> > >  
> > > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > > +	u8 new_root_level = role_regs_to_root_level(&regs);
> > > +
> > > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > > +	    context->root_level == new_root_level)
> > >  		return;
> > 
> > role_regs_to_root_level() uses 3 things: CR0.PG, EFER.LMA and CR4.PAE
> > and two of these three are already encoded into extended mmu role
> > (kvm_calc_mmu_role_ext()). Could we achieve the same result by adding
> > EFER.LMA there?
> 
> Absolutely. I just wanted your feedback on this to see if there is any reason
> to not do this.

Assuming my assessment above is correct, incorporating EFER.LMA into the
extended role is the correct fix.  That will naturally do the right thing for
nested EPT in the sense that kvm_calc_shadow_ept_root_page_role() will ignore
EFER.LMA entirely.

> Also it seems that only basic role is compared here.

No, it's the full role.  "as_u64" is the overlay for the combined base+ext.

union kvm_mmu_role {
	u64 as_u64;
	struct {
		union kvm_mmu_page_role base;
		union kvm_mmu_extended_role ext;
	};
};

> I don't 100% know the reason why we have basic and extended roles - there is a
> comment about basic/extended mmu role to minimize the size of arch.gfn_track,
> but I haven't yet studied in depth why.

The "base" role is used to identify which individual shadow pages are compatible
with the current shadow/TDP paging context.  Using TDP as an example, KVM can
reuse SPs at the correct level regardless of guest LA57.  The gfn_track thing
tracks ever SP in existence for a given gfn.  To minimize the number of possible
SPs, and thus minimize the storage capacity needed for gfn_track, the "base" role
is kept as tiny as possible.

> > >  	context->mmu_role.as_u64 = new_role.as_u64;
> > > @@ -4757,7 +4760,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> > >  	context->get_guest_pgd = get_cr3;
> > >  	context->get_pdptr = kvm_pdptr_read;
> > >  	context->inject_page_fault = kvm_inject_page_fault;
> > > -	context->root_level = role_regs_to_root_level(&regs);
> > > +	context->root_level = new_root_level;
> > >  
> > >  	if (!is_cr0_pg(context))
> > >  		context->gva_to_gpa = nonpaging_gva_to_gpa;
> > > @@ -4806,7 +4809,10 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> > >  				    struct kvm_mmu_role_regs *regs,
> > >  				    union kvm_mmu_role new_role)
> > >  {
> > > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > > +	u8 new_root_level = role_regs_to_root_level(regs);
> > > +
> > > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > > +	    context->root_level == new_root_level)

Doesn't matter if EFER.LMA is added to the role, but this extra check shouldn't
be necessary for shadow paging as LMA is factored into role.base.level in that
case.  TDP is problematic because role.base.level holds the TDP root level, which
doesn't depend on guest.EFER.LMA.

Nested EPT is also ok because shadow_root_level == root_level in that case.

> > >  		return;
> > >  
> > >  	context->mmu_role.as_u64 = new_role.as_u64;
> > > @@ -4817,8 +4823,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> > >  		paging64_init_context(context);
> > >  	else
> > >  		paging32_init_context(context);
> > > -	context->root_level = role_regs_to_root_level(regs);
> > >  
> > > +	context->root_level = new_root_level;
> > >  	reset_guest_paging_metadata(vcpu, context);
> > >  	context->shadow_root_level = new_role.base.level;
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes
  2021-11-10 17:21       ` Sean Christopherson
@ 2021-11-15 12:14         ` Maxim Levitsky
  0 siblings, 0 replies; 11+ messages in thread
From: Maxim Levitsky @ 2021-11-15 12:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, kvm, Wanpeng Li, Borislav Petkov, Ingo Molnar,
	H. Peter Anvin, linux-kernel, Joerg Roedel,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Thomas Gleixner, Paolo Bonzini, Jim Mattson

On Wed, 2021-11-10 at 17:21 +0000, Sean Christopherson wrote:
> On Wed, Nov 10, 2021, Maxim Levitsky wrote:
> > On Wed, 2021-11-10 at 15:48 +0100, Vitaly Kuznetsov wrote:
> > > Maxim Levitsky <mlevitsk@redhat.com> writes:
> > > 
> > > > When running mix of 32 and 64 bit guests, it is possible to have mmu
> > > > reset with same mmu role but different root level (32 bit vs 64 bit paging)
> 
> Hmm, no, it's far more nuanced than just "running a mix of 32 and 64 bit guests".
> The changelog needs a much more in-depth explanation of exactly what and how
> things go awry.  I suspect that whatever bug is being hit is unique to the
> migration path.
> 
> This needs a Fixes and Cc: stable@vger.kernel.org, but which commit is fixed is
> TBD.
> 
> Simply running 32 and 64 bit guests is not sufficient to cause problems.  In
> that case, they are different VMs entirely, where as the MMU context is per-vCPU.
> So at minimum, this require running a mix of 32-bit and 64-bit _nested_ guests.



> 
> But even that isn't sufficient.  Ignoring migration for the moment:
This is the key point - without migration that bug can't happen.

> 
>   a) If shadow paging is enabled in L0, then EFER.LMA is captured in
>      kvm_mmu_page_role.level.  Ergo this flaw doesn't affect legacy shadow paging.
True. moreover if L1 doesn't use NPT/EPT, then there is nothing special in nested mode,
It just either normal NPT/EPT direct mode or if L0 also disabled NPT/EPT, then
it is normal shadow mmu, and the mmu context is indeed shared which means that
guest_mode should reset it well.

> 
>   b) If EPT is enabled in L0 _and_ L1, kvm_mmu_page_role.level tracks L1's EPT
>      level.  Ergo, this flaw doesn't affect nested EPT.
Well it does _after_ nested migration.

If you look at kvm_calc_tdp_mmu_root_page_role, you see that the role.base.level reflects
the NPT/EPT level which is on 64 bit hosts 4 or even 5, and it never changes,
regardless of 64/32 bitness of the L1 and/or L2.

However, the 'context->root_level' reflects the L1's guest root level which
depends on L1's EFER, and after migration it is incorrectly initialized with
32 bit EFER, and that is not captured in the mmu role.

That MMU is incorrectly initialized right after nested migration when we are still not
nested and qemu uploads SREGS, and then it is supposed to be initialized again
after first nested VM exit to the host, but it doesn't as I explained above.

Note that I am here talking only about L1 mmu (aka root_mmu), so guest_mode doesn't apply here,
it is always false.


This problem does actually happen on SVM as well I see after I tested it now, 
in fact putting some printks, I see that L1 mmu is stuck in 
PAE mode forever (it supposed to be in 64 bit mode as L1 is 64 bit guest) 
after a nested migration.

The L1 guest just seems to somehow to 'work', but L2 does eventually hang, as opposed to 
wrong injected pagefault which kills L1 immediately on VMX.

It is either luck or slight differencies between EPT and NPT - the root level
is mostly used for page walks which happen usually on MMIO accesses.


> 
>   c) If NPT is enabled in L0 _and_ L1, then this flaw can does apply as
>      kvm_mmu_page_role.level tracks L0's NPT level, which does not incorporate
>      L1's NPT level because of the limitations of NPT.  But the cover letter
>      states these bugs are specific to VMX.  I suspect that's incorrect and that
>      in theory this particular bug does apply to SVM, but that it hasn't been
>      hit due to not running with nNPT and both 32-bit and 64-bit L2s on a single
>      vCPU.

To have different NPT level in L0 and L1, you have to run a 32 bit L1, which can
only run thankfully 32 bit nested guests, so no mixing at L2 level possible.

However mixing 5/4 level NPT I won't rule that as impossible. I don't yet test
this because I usually don't use hardware that has support for 5 level NPT/EPT.



> 
>   d) If TDP is enabled in L0 but _not_ L1, then L1 and L2 share an MMU context,
>      and that context is guaranteed to be reset on every nested transition due
>      to kvm_mmu_page_role.guest_mode.  Ergo this flaw doesn't affect this combo
>      since it's impossible to switch between two L2 vCPUs on a single L1 vCPU
>      without a context reset.
Yes.

> 
> The cover letter says:
> 
>   The second issue is that L2 IA32_EFER makes L1's mmu be initialized incorrectly
>   (with PAE paging). This itself isn't an immediate problem as we are going into the L2,
>   but when we exit it, we don't reset the L1's mmu back to 64 bit mode because,
>   It so happens that the mmu role doesn't change and the 64 bitness isn't part of the mmu role.
> 
> But that should be impossible because of kvm_mmu_page_role.guest_mode.  If that
> doesn't trigger a reset, then presumably is_guest_mode() is stale?
Since the bug affects only L1's mmu, the guest mode is always false for it.

> 
> Regarding (c), I believe this can be hit by running a 32-bit L2 and 64-bit L2 on
> SVM with nNPT.  I don't think that's a combination I've tested much, if at all.
I tested this just in case few times, seems to work very well and survives migration
of L1 as well.

> 
> Regarding (d), I believe the bug can rear its head if guest state is stuffed
> via KVM_SET_SREGS{2}.  kvm_mmu_reset_context() will re-init the MMU, but it
> doesn't purge the previous context.  I.e. the assumption that a switch between
> "two" L2s can be broken.
> 
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 14 ++++++++++----
> > > >  1 file changed, 10 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 354d2ca92df4d..763867475860f 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4745,7 +4745,10 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> > > >  	union kvm_mmu_role new_role =
> > > >  		kvm_calc_tdp_mmu_root_page_role(vcpu, &regs, false);
> > > >  
> > > > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > > > +	u8 new_root_level = role_regs_to_root_level(&regs);
> > > > +
> > > > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > > > +	    context->root_level == new_root_level)
> > > >  		return;
> > > 
> > > role_regs_to_root_level() uses 3 things: CR0.PG, EFER.LMA and CR4.PAE
> > > and two of these three are already encoded into extended mmu role
> > > (kvm_calc_mmu_role_ext()). Could we achieve the same result by adding
> > > EFER.LMA there?
> > 
> > Absolutely. I just wanted your feedback on this to see if there is any reason
> > to not do this.
> 
> Assuming my assessment above is correct, incorporating EFER.LMA into the
> extended role is the correct fix.  That will naturally do the right thing for
> nested EPT in the sense that kvm_calc_shadow_ept_root_page_role() will ignore
> EFER.LMA entirely.

Yes, and the patch is on the way.

> 
> > Also it seems that only basic role is compared here.
> 
> No, it's the full role.  "as_u64" is the overlay for the combined base+ext.
> 
> union kvm_mmu_role {
> 	u64 as_u64;
> 	struct {
> 		union kvm_mmu_page_role base;
> 		union kvm_mmu_extended_role ext;
> 	};
> };
> 
> > I don't 100% know the reason why we have basic and extended roles - there is a
> > comment about basic/extended mmu role to minimize the size of arch.gfn_track,
> > but I haven't yet studied in depth why.
> 
> The "base" role is used to identify which individual shadow pages are compatible
> with the current shadow/TDP paging context.  Using TDP as an example, KVM can
> reuse SPs at the correct level regardless of guest LA57.  The gfn_track thing
> tracks ever SP in existence for a given gfn.  To minimize the number of possible
> SPs, and thus minimize the storage capacity needed for gfn_track, the "base" role
> is kept as tiny as possible.
> 
> > > >  	context->mmu_role.as_u64 = new_role.as_u64;
> > > > @@ -4757,7 +4760,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
> > > >  	context->get_guest_pgd = get_cr3;
> > > >  	context->get_pdptr = kvm_pdptr_read;
> > > >  	context->inject_page_fault = kvm_inject_page_fault;
> > > > -	context->root_level = role_regs_to_root_level(&regs);
> > > > +	context->root_level = new_root_level;
> > > >  
> > > >  	if (!is_cr0_pg(context))
> > > >  		context->gva_to_gpa = nonpaging_gva_to_gpa;
> > > > @@ -4806,7 +4809,10 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> > > >  				    struct kvm_mmu_role_regs *regs,
> > > >  				    union kvm_mmu_role new_role)
> > > >  {
> > > > -	if (new_role.as_u64 == context->mmu_role.as_u64)
> > > > +	u8 new_root_level = role_regs_to_root_level(regs);
> > > > +
> > > > +	if (new_role.as_u64 == context->mmu_role.as_u64 &&
> > > > +	    context->root_level == new_root_level)
> 
> Doesn't matter if EFER.LMA is added to the role, but this extra check shouldn't
> be necessary for shadow paging as LMA is factored into role.base.level in that
> case.  TDP is problematic because role.base.level holds the TDP root level, which
> doesn't depend on guest.EFER.LMA.
> 
> Nested EPT is also ok because shadow_root_level == root_level in that case.
> 
> > > >  		return;
> > > >  
> > > >  	context->mmu_role.as_u64 = new_role.as_u64;
> > > > @@ -4817,8 +4823,8 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte
> > > >  		paging64_init_context(context);
> > > >  	else
> > > >  		paging32_init_context(context);
> > > > -	context->root_level = role_regs_to_root_level(regs);
> > > >  
> > > > +	context->root_level = new_root_level;
> > > >  	reset_guest_paging_metadata(vcpu, context);
> > > >  	context->shadow_root_level = new_role.base.level;


Thanks for the review!
	Best regards,
		Maxim Levitsky



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-15 12:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 10:00 [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests Maxim Levitsky
2021-11-10 10:00 ` [PATCH 1/3] KVM: nVMX: extract calculation of the L1's EFER Maxim Levitsky
2021-11-10 14:43   ` Vitaly Kuznetsov
2021-11-10 10:00 ` [PATCH 2/3] KVM: nVMX: restore L1's EFER prior to setting the nested state Maxim Levitsky
2021-11-10 15:01   ` Paolo Bonzini
2021-11-10 15:08     ` Maxim Levitsky
2021-11-10 10:00 ` [PATCH 3/3] KVM: x86/mmu: don't skip mmu initialization when mmu root level changes Maxim Levitsky
2021-11-10 14:48   ` Vitaly Kuznetsov
2021-11-10 15:00     ` Maxim Levitsky
2021-11-10 17:21       ` Sean Christopherson
2021-11-15 12:14         ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).