All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Cc: Daan De Meyer <daan.j.demeyer@gmail.com>, linux-kernel@vger.kernel.org
Subject: [PATCH RFC not-to-be-merged] KVM: SVM: Workaround overly strict CR3 check by Hyper-V
Date: Tue, 19 Mar 2024 17:34:56 +0100	[thread overview]
Message-ID: <20240319163456.133942-1-vkuznets@redhat.com> (raw)

Failing VMRUNs (immediate #VMEXIT with error code VMEXIT_INVALID) for KVM
guests on top of Hyper-V are observed when KVM does SMM emulation. The root
cause of the problem appears to be an overly strict CR3 VMCB check done by
Hyper-V. Here's an example of a CR state which triggers the failure:

 kvm_amd: vmpl: 0   cpl: 0   efer: 0000000000001000
 kvm_amd: cr0: 0000000000050032 cr2: ffff92dcf8601000
 kvm_amd: cr3: 0000000100232003 cr4: 0000000000000040

CR3 value may look a bit weird as it has non-zero PCID bits set as well as
non-zero bits in the upper half but the processor is not in long
mode. This, however, is a valid state upon entering SMM from a long mode
context with PCID enabled and should not be causing VMEXIT_INVALID. APM
says that VMEXIT_INVALID is triggered when "Any MBZ bit of CR3 is
set.". In CR3 format the only MBZ bits are those above MAXPHYADDR, the rest
is just "Reserved".

Place a temporary workaround in KVM to avoid putting problematic CR3
values into VMCB when KVM runs on top of Hyper-V. Enable CR3 READ/WRITE
intercepts to make sure guest is not observing side-effects of the
mangling. Also, do not overwrite 'vcpu->arch.cr3' with mangled 'save.cr3'
value when CR3 intercepts are enabled (and thus a possible CR3 update from
the guest would change 'vcpu->arch.cr3' instantly).

The workaround is only needed until Hyper-V gets fixed.

Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
- The patch serves mostly documentational purposes, I don't expect it to
be merged to the mainline. Hyper-V *is* supposed to get fixed but the
timeline is unclear at this point. As Azure is a fairly popular platform
for running nested KVM, it is possible that the bug will get discovered
again (running OVMF based guest is a good starting point!).
---
 arch/x86/kvm/svm/svm.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 272d5ed37ce7..6ff7cbcb5cac 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -41,6 +41,7 @@
 #include <asm/traps.h>
 #include <asm/reboot.h>
 #include <asm/fpu/api.h>
+#include <asm/hypervisor.h>
 
 #include <trace/events/ipi.h>
 
@@ -3497,7 +3498,7 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (!sev_es_guest(vcpu->kvm)) {
 		if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
 			vcpu->arch.cr0 = svm->vmcb->save.cr0;
-		if (npt_enabled)
+		if (npt_enabled && !svm_is_intercept(svm, INTERCEPT_CR3_WRITE))
 			vcpu->arch.cr3 = svm->vmcb->save.cr3;
 	}
 
@@ -4264,6 +4265,33 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 		cr3 = root_hpa;
 	}
 
+#if IS_ENABLED(CONFIG_HYPERV)
+	/*
+	 * Workaround an issue in Hyper-V hypervisor where 'reserved' bits are treated
+	 * as MBZ failing VMRUN.
+	 */
+	if (hypervisor_is_type(X86_HYPER_MS_HYPERV) && likely(npt_enabled)) {
+		unsigned long cr3_unmod = cr3;
+
+		/*
+		 * Bits MAXPHYADDR:63 are MBZ but bits 32:MAXPHYADDR-1 are just 'reserved'
+		 * in !long mode.
+		 */
+		if (!is_long_mode(vcpu))
+			cr3 &= ~rsvd_bits(32, cpuid_maxphyaddr(vcpu) - 1);
+
+		if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE))
+			cr3 &= ~X86_CR3_PCID_MASK;
+
+		if (cr3 != cr3_unmod && !svm_is_intercept(svm, INTERCEPT_CR3_READ)) {
+			svm_set_intercept(svm, INTERCEPT_CR3_READ);
+			svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
+		} else if (cr3 == cr3_unmod && svm_is_intercept(svm, INTERCEPT_CR3_READ)) {
+			svm_clr_intercept(svm, INTERCEPT_CR3_READ);
+			svm_clr_intercept(svm, INTERCEPT_CR3_WRITE);
+		}
+	}
+#endif
 	svm->vmcb->save.cr3 = cr3;
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
 }
-- 
2.44.0


                 reply	other threads:[~2024-03-19 16:35 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319163456.133942-1-vkuznets@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=daan.j.demeyer@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.