[PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests

From: Maxim Levitsky <mlevitsk@redhat.com>
To: kvm@vger.kernel.org
Cc: Wanpeng Li <wanpengli@tencent.com>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	Sean Christopherson <seanjc@google.com>,
	Joerg Roedel <joro@8bytes.org>,
	x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)),
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	Maxim Levitsky <mlevitsk@redhat.com>
Subject: [PATCH 0/3] VMX: nested migration fixes for 32 bit nested guests
Date: Wed, 10 Nov 2021 12:00:15 +0200	[thread overview]
Message-ID: <20211110100018.367426-1-mlevitsk@redhat.com> (raw)

This is hopefully the last issue I was tracking in regard to nested migration,
as far as I know.

The issue is that migration of L1 which is normal 64 bit guest,
but is running a 32 bit nested guest is broken on VMX and I finally found out why.

There are two bugs, both related to the fact that qemu first restores SREGS
of L2, and only then sets the nested state. That haunts us till this day.

First issue is that vmx_set_nested_state does some checks on the host
state stored in vmcs12, but it uses the current IA32_EFER which is from L2.
Thus, consistency checks fail.

I fixed this by restoring L1's efer from vmcs12, letting these checks pass,
which is somewhat hacky so I am open for better suggestions on how to do this.
One option is to pass explicit value of the L1's IA32_EFER to the consistency
check code, and leave L2's IA32_EFER alone.

The second issue is that L2 IA32_EFER makes L1's mmu be initialized incorrectly
(with PAE paging). This itself isn't an immediate problem as we are going into the L2,
but when we exit it, we don't reset the L1's mmu back to 64 bit mode because,
It so happens that the mmu role doesn't change and the 64 bitness isn't part of the mmu role.

I fixed this also with somewhat a hack by checking that mmu's level didn't change,
but there is also an option to make 64 bitness be part of the mmu role.

Also when restoring the L1's IA32_EFER, it is possible to reset L1's mmu,
so that it is setup correctly, which isn't strictly needed but does
make it more bug proof.
The 3rd patch is still needed as resetting the mmu right after restoring
IA32_EFER does nothing without this patch as well.

SVM in theory has both issues, but restoring L1's EFER into vcpu->arch.efer
isn't needed there as the code explicitly checks the L1's save area instead
for consistency.

Best regards,
	Maxim Levitsky

Maxim Levitsky (3):
  KVM: nVMX: extract calculation of the L1's EFER
  KVM: nVMX: restore L1's EFER prior to setting the nested state
  KVM: x86/mmu: don't skip mmu initialization when mmu root level
    changes

 arch/x86/kvm/mmu/mmu.c    | 14 ++++++++++----
 arch/x86/kvm/vmx/nested.c | 33 +++++++++++++++++++++++++++------
 2 files changed, 37 insertions(+), 10 deletions(-)

-- 
2.26.3