KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
@ 2020-07-10 15:48 Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 1/9] KVM: x86: Add helper functions for illegal GPA checking and page fault injection Mohammed Gamal
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal

When EPT is enabled, KVM does not really look at guest physical
address size. Address bits above maximum physical memory size are reserved.
Because KVM does not look at these guest physical addresses, it currently
effectively supports guest physical address sizes equal to the host.

This can be problem when having a mixed setup of machines with 5-level page
tables and machines with 4-level page tables, as live migration can change
MAXPHYADDR while the guest runs, which can theoretically introduce bugs.

In this patch series we add checks on guest physical addresses in EPT
violation/misconfig and NPF vmexits and if needed inject the proper
page faults in the guest.

A more subtle issue is when the host MAXPHYADDR is larger than that of the
guest. Page faults caused by reserved bits on the guest won't cause an EPT
violation/NPF and hence we also check guest MAXPHYADDR and add PFERR_RSVD_MASK
error code to the page fault if needed.

----

Changes from v2:
- Drop support for this feature on AMD processors after discussion with AMD


Mohammed Gamal (5):
  KVM: x86: Add helper functions for illegal GPA checking and page fault
    injection
  KVM: x86: mmu: Move translate_gpa() to mmu.c
  KVM: x86: mmu: Add guest physical address check in translate_gpa()
  KVM: VMX: Add guest physical address check in EPT violation and
    misconfig
  KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
    configurable

Paolo Bonzini (4):
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  KVM: x86: update exception bitmap on CPUID changes
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match

 arch/x86/include/asm/kvm_host.h | 10 ++------
 arch/x86/kvm/cpuid.c            |  2 ++
 arch/x86/kvm/mmu.h              |  6 +++++
 arch/x86/kvm/mmu/mmu.c          | 12 +++++++++
 arch/x86/kvm/svm/svm.c          | 22 +++++++++++++---
 arch/x86/kvm/vmx/nested.c       | 28 ++++++++++++--------
 arch/x86/kvm/vmx/vmx.c          | 45 +++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/vmx.h          |  6 +++++
 arch/x86/kvm/x86.c              | 29 ++++++++++++++++++++-
 arch/x86/kvm/x86.h              |  1 +
 include/uapi/linux/kvm.h        |  1 +
 11 files changed, 133 insertions(+), 29 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/9] KVM: x86: Add helper functions for illegal GPA checking and page fault injection
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 2/9] KVM: x86: mmu: Move translate_gpa() to mmu.c Mohammed Gamal
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal

This patch adds two helper functions that will be used to support virtualizing
MAXPHYADDR in both kvm-intel.ko and kvm.ko.

kvm_fixup_and_inject_pf_error() injects a page fault for a user-specified GVA,
while kvm_mmu_is_illegal_gpa() checks whether a GPA exceeds vCPU address limits.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.h |  6 ++++++
 arch/x86/kvm/x86.c | 21 +++++++++++++++++++++
 arch/x86/kvm/x86.h |  1 +
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 444bb9c54548..59930231d5d5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -4,6 +4,7 @@
 
 #include <linux/kvm_host.h>
 #include "kvm_cache_regs.h"
+#include "cpuid.h"
 
 #define PT64_PT_BITS 9
 #define PT64_ENT_PER_PAGE (1 << PT64_PT_BITS)
@@ -158,6 +159,11 @@ static inline bool is_write_protection(struct kvm_vcpu *vcpu)
 	return kvm_read_cr0_bits(vcpu, X86_CR0_WP);
 }
 
+static inline bool kvm_mmu_is_illegal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+        return (gpa >= BIT_ULL(cpuid_maxphyaddr(vcpu)));
+}
+
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..1f5f4074fc59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10699,6 +10699,27 @@ u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_spec_ctrl_valid_bits);
 
+void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code)
+{
+	struct x86_exception fault;
+
+	if (!(error_code & PFERR_PRESENT_MASK) ||
+	    vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, error_code, &fault) != UNMAPPED_GVA) {
+		/*
+		 * If vcpu->arch.walk_mmu->gva_to_gpa succeeded, the page
+		 * tables probably do not match the TLB.  Just proceed
+		 * with the error code that the processor gave.
+		 */
+		fault.vector = PF_VECTOR;
+		fault.error_code_valid = true;
+		fault.error_code = error_code;
+		fault.nested_page_fault = false;
+		fault.address = gva;
+	}
+	vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault);
+}
+EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6eb62e97e59f..239ae0f3e40b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -272,6 +272,7 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 					  int page_num);
 bool kvm_vector_hashing_enabled(void);
+void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code);
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			    int emulation_type, void *insn, int insn_len);
 fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 2/9] KVM: x86: mmu: Move translate_gpa() to mmu.c
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 1/9] KVM: x86: Add helper functions for illegal GPA checking and page fault injection Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa() Mohammed Gamal
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal

Also no point of it being inline since it's always called through
function pointers. So remove that.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ------
 arch/x86/kvm/mmu/mmu.c          | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index be5363b21540..62373cc06c72 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1551,12 +1551,6 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_page_level);
 
-static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
-				  struct x86_exception *exception)
-{
-	return gpa;
-}
-
 static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
 {
 	struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d6a0ae7800c..f8b3c5181466 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -515,6 +515,12 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 	return likely(kvm_gen == spte_gen);
 }
 
+static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+                                  struct x86_exception *exception)
+{
+        return gpa;
+}
+
 /*
  * Sets the shadow PTE masks used by the MMU.
  *
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa()
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 1/9] KVM: x86: Add helper functions for illegal GPA checking and page fault injection Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 2/9] KVM: x86: mmu: Move translate_gpa() to mmu.c Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 17:41   ` Paolo Bonzini
  2020-07-10 15:48 ` [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap Mohammed Gamal
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal

In case of running a guest with 4-level page tables on a 5-level page
table host, it might happen that a guest might have a physical address
with reserved bits set, but the host won't see that and trap it.

Hence, we need to check page faults' physical addresses against the guest's
maximum physical memory and if it's exceeded, we need to add
the PFERR_RSVD_MASK bits to the PF's error code.

Also make sure the error code isn't overwritten by the page table walker.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f8b3c5181466..e03e85b21cda 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -518,6 +518,12 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
                                   struct x86_exception *exception)
 {
+	/* Check if guest physical address doesn't exceed guest maximum */
+	if (kvm_mmu_is_illegal_gpa(vcpu, gpa)) {
+		exception->error_code |= PFERR_RSVD_MASK;
+		return UNMAPPED_GVA;
+	}
+
         return gpa;
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (2 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa() Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 16:15   ` Jim Mattson
  2020-07-10 15:48 ` [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes Mohammed Gamal
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

From: Paolo Bonzini <pbonzini@redhat.com>

We would like to introduce a callback to update the #PF intercept
when CPUID changes.  Just reuse update_bp_intercept since VMX is
already using update_exception_bitmap instead of a bespoke function.

While at it, remove an unnecessary assignment in the SVM version,
which is already done in the caller (kvm_arch_vcpu_ioctl_set_guest_debug)
and has nothing to do with the exception bitmap.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm/svm.c          | 7 +++----
 arch/x86/kvm/vmx/vmx.c          | 2 +-
 arch/x86/kvm/x86.c              | 2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62373cc06c72..bb4044ffb7b7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1098,7 +1098,7 @@ struct kvm_x86_ops {
 	void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
 	void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
-	void (*update_bp_intercept)(struct kvm_vcpu *vcpu);
+	void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
 	int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
 	u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c0da4dd78ac5..79c33b3539f0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1627,7 +1627,7 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
 	mark_dirty(svm->vmcb, VMCB_SEG);
 }
 
-static void update_bp_intercept(struct kvm_vcpu *vcpu)
+static void update_exception_bitmap(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -1636,8 +1636,7 @@ static void update_bp_intercept(struct kvm_vcpu *vcpu)
 	if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
 			set_exception_intercept(svm, BP_VECTOR);
-	} else
-		vcpu->guest_debug = 0;
+	}
 }
 
 static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
@@ -3989,7 +3988,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.vcpu_blocking = svm_vcpu_blocking,
 	.vcpu_unblocking = svm_vcpu_unblocking,
 
-	.update_bp_intercept = update_bp_intercept,
+	.update_exception_bitmap = update_exception_bitmap,
 	.get_msr_feature = svm_get_msr_feature,
 	.get_msr = svm_get_msr,
 	.set_msr = svm_set_msr,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 13745f2a5ecd..178ee92551a9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7859,7 +7859,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.vcpu_load = vmx_vcpu_load,
 	.vcpu_put = vmx_vcpu_put,
 
-	.update_bp_intercept = update_exception_bitmap,
+	.update_exception_bitmap = update_exception_bitmap,
 	.get_msr_feature = vmx_get_msr_feature,
 	.get_msr = vmx_get_msr,
 	.set_msr = vmx_set_msr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f5f4074fc59..03c401963062 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9281,7 +9281,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 	 */
 	kvm_set_rflags(vcpu, rflags);
 
-	kvm_x86_ops.update_bp_intercept(vcpu);
+	kvm_x86_ops.update_exception_bitmap(vcpu);
 
 	r = 0;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (3 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 16:25   ` Jim Mattson
  2020-07-10 15:48 ` [PATCH v3 6/9] KVM: VMX: introduce vmx_need_pf_intercept Mohammed Gamal
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

From: Paolo Bonzini <pbonzini@redhat.com>

Allow vendor code to observe changes to MAXPHYADDR and start/stop
intercepting page faults.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/cpuid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..ea5bbf2153bb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -128,6 +128,8 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	kvm_mmu_reset_context(vcpu);
 
 	kvm_pmu_refresh(vcpu);
+	kvm_x86_ops.update_exception_bitmap(vcpu);
+
 	return 0;
 }
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 6/9] KVM: VMX: introduce vmx_need_pf_intercept
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (4 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

From: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/nested.c | 28 +++++++++++++++++-----------
 arch/x86/kvm/vmx/vmx.c    |  2 +-
 arch/x86/kvm/vmx/vmx.h    |  5 +++++
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b26655104d4a..1aea9e3b8c43 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2433,22 +2433,28 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 
 	/*
 	 * Whether page-faults are trapped is determined by a combination of
-	 * 3 settings: PFEC_MASK, PFEC_MATCH and EXCEPTION_BITMAP.PF.
-	 * If enable_ept, L0 doesn't care about page faults and we should
-	 * set all of these to L1's desires. However, if !enable_ept, L0 does
-	 * care about (at least some) page faults, and because it is not easy
-	 * (if at all possible?) to merge L0 and L1's desires, we simply ask
-	 * to exit on each and every L2 page fault. This is done by setting
-	 * MASK=MATCH=0 and (see below) EB.PF=1.
+	 * 3 settings: PFEC_MASK, PFEC_MATCH and EXCEPTION_BITMAP.PF.  If L0
+	 * doesn't care about page faults then we should set all of these to
+	 * L1's desires. However, if L0 does care about (some) page faults, it
+	 * is not easy (if at all possible?) to merge L0 and L1's desires, we
+	 * simply ask to exit on each and every L2 page fault. This is done by
+	 * setting MASK=MATCH=0 and (see below) EB.PF=1.
 	 * Note that below we don't need special code to set EB.PF beyond the
 	 * "or"ing of the EB of vmcs01 and vmcs12, because when enable_ept,
 	 * vmcs01's EB.PF is 0 so the "or" will take vmcs12's value, and when
 	 * !enable_ept, EB.PF is 1, so the "or" will always be 1.
 	 */
-	vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK,
-		enable_ept ? vmcs12->page_fault_error_code_mask : 0);
-	vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH,
-		enable_ept ? vmcs12->page_fault_error_code_match : 0);
+	if (vmx_need_pf_intercept(&vmx->vcpu)) {
+		/*
+		 * TODO: if both L0 and L1 need the same MASK and MATCH,
+		 * go ahead and use it?
+		 */
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0);
+	} else {
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, vmcs12->page_fault_error_code_mask);
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, vmcs12->page_fault_error_code_match);
+	}
 
 	if (cpu_has_vmx_apicv()) {
 		vmcs_write64(EOI_EXIT_BITMAP0, vmcs12->eoi_exit_bitmap0);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 178ee92551a9..770b090969fb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -780,7 +780,7 @@ void update_exception_bitmap(struct kvm_vcpu *vcpu)
 		eb |= 1u << BP_VECTOR;
 	if (to_vmx(vcpu)->rmode.vm86_active)
 		eb = ~0;
-	if (enable_ept)
+	if (!vmx_need_pf_intercept(vcpu))
 		eb &= ~(1u << PF_VECTOR);
 
 	/* When we are running a nested L2 guest and L1 specified for it a
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 639798e4a6ca..b0e5e210f1c1 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -550,6 +550,11 @@ static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 		SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
 }
 
+static inline bool vmx_need_pf_intercept(struct kvm_vcpu *vcpu)
+{
+	return !enable_ept;
+}
+
 void dump_vmcs(void);
 
 #endif /* __KVM_X86_VMX_H */
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (5 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 6/9] KVM: VMX: introduce vmx_need_pf_intercept Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-13 18:32   ` Sean Christopherson
                     ` (2 more replies)
  2020-07-10 15:48 ` [PATCH v3 8/9] KVM: VMX: optimize #PF injection when MAXPHYADDR does not match Mohammed Gamal
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal

Check guest physical address against it's maximum physical memory. If
the guest's physical address exceeds the maximum (i.e. has reserved bits
set), inject a guest page fault with PFERR_RSVD_MASK set.

This has to be done both in the EPT violation and page fault paths, as
there are complications in both cases with respect to the computation
of the correct error code.

For EPT violations, unfortunately the only possibility is to emulate,
because the access type in the exit qualification might refer to an
access to a paging structure, rather than to the access performed by
the program.

Trapping page faults instead is needed in order to correct the error code,
but the access type can be obtained from the original error code and
passed to gva_to_gpa.  The corrections required in the error code are
subtle. For example, imagine that a PTE for a supervisor page has a reserved
bit set.  On a supervisor-mode access, the EPT violation path would trigger.
However, on a user-mode access, the processor will not notice the reserved
bit and not include PFERR_RSVD_MASK in the error code.

Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h |  3 ++-
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 770b090969fb..de3f436b2d32 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 
 	if (is_page_fault(intr_info)) {
 		cr2 = vmx_get_exit_qual(vcpu);
-		/* EPT won't cause page fault directly */
-		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
-		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
+		if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
+			/*
+			 * EPT will cause page fault only if we need to
+			 * detect illegal GPAs.
+			 */
+			kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
+			return 1;
+		} else
+			return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
 	}
 
 	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
@@ -5308,6 +5314,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
 
 	vcpu->arch.exit_qualification = exit_qualification;
+
+	/*
+	 * Check that the GPA doesn't exceed physical memory limits, as that is
+	 * a guest page fault.  We have to emulate the instruction here, because
+	 * if the illegal address is that of a paging structure, then
+	 * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
+	 * would also use advanced VM-exit information for EPT violations to
+	 * reconstruct the page fault error code.
+	 */
+	if (unlikely(kvm_mmu_is_illegal_gpa(vcpu, gpa)))
+		return kvm_emulate_instruction(vcpu, 0);
+
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b0e5e210f1c1..0d06951e607c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -11,6 +11,7 @@
 #include "kvm_cache_regs.h"
 #include "ops.h"
 #include "vmcs.h"
+#include "cpuid.h"
 
 extern const u32 vmx_msr_index[];
 
@@ -552,7 +553,7 @@ static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
 
 static inline bool vmx_need_pf_intercept(struct kvm_vcpu *vcpu)
 {
-	return !enable_ept;
+	return !enable_ept || cpuid_maxphyaddr(vcpu) < boot_cpu_data.x86_phys_bits;
 }
 
 void dump_vmcs(void);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 8/9] KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (6 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 15:48 ` [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable Mohammed Gamal
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

From: Paolo Bonzini <pbonzini@redhat.com>

Ignore non-present page faults, since those cannot have reserved
bits set.

When running access.flat with "-cpu Haswell,phys-bits=36", the
number of trapped page faults goes down from 8872644 to 3978948.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index de3f436b2d32..0cebc4832805 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4355,6 +4355,16 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmx->pt_desc.guest.output_mask = 0x7F;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
 	}
+
+	/*
+	 * If EPT is enabled, #PF is only trapped if MAXPHYADDR is mismatched
+	 * between guest and host.  In that case we only care about present
+	 * faults.
+	 */
+	if (enable_ept) {
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, PFERR_PRESENT_MASK);
+		vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, PFERR_PRESENT_MASK);
+	}
 }
 
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (7 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 8/9] KVM: VMX: optimize #PF injection when MAXPHYADDR does not match Mohammed Gamal
@ 2020-07-10 15:48 ` Mohammed Gamal
  2020-07-10 17:40   ` Paolo Bonzini
  2020-07-10 16:30 ` [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Jim Mattson
  2020-07-10 17:49 ` Paolo Bonzini
  10 siblings, 1 reply; 28+ messages in thread
From: Mohammed Gamal @ 2020-07-10 15:48 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Mohammed Gamal, Tom Lendacky, Babu Moger

The reason behind including this patch is unexpected behaviour we see
with NPT vmexit handling in AMD processor.

With previous patch ("KVM: SVM: Add guest physical address check in
NPF/PF interception") we see the followning error multiple times in
the 'access' test in kvm-unit-tests:

            test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
            Dump mapping: address: 0x123400000000
            ------L4: 24c3027
            ------L3: 24c4027
            ------L2: 24c5021
            ------L1: 1002000021

This shows that the PTE's accessed bit is apparently being set by
the CPU hardware before the NPF vmexit. This completely handled by
hardware and can not be fixed in software.

This patch introduces a workaround. We add a boolean variable:
'allow_smaller_maxphyaddr'
Which is set individually by VMX and SVM init routines. On VMX it's
always set to true, on SVM it's only set to true when NPT is not
enabled.

We also add a new capability KVM_CAP_SMALLER_MAXPHYADDR which
allows userspace to query if the underlying architecture would
support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
(e.g. qemu can decide if it would ignore the -cpu ..,phys-bits=X)

CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Babu Moger <babu.moger@amd.com>
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm/svm.c          | 15 +++++++++++++++
 arch/x86/kvm/vmx/vmx.c          |  7 +++++++
 arch/x86/kvm/x86.c              |  6 ++++++
 include/uapi/linux/kvm.h        |  1 +
 5 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bb4044ffb7b7..26002e1b47f7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1304,7 +1304,7 @@ struct kvm_arch_async_pf {
 };
 
 extern u64 __read_mostly host_efer;
-
+extern bool __read_mostly allow_smaller_maxphyaddr;
 extern struct kvm_x86_ops kvm_x86_ops;
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 79c33b3539f0..f3d7ae26875c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -924,6 +924,21 @@ static __init int svm_hardware_setup(void)
 
 	svm_set_cpu_caps();
 
+	/*
+	 * It seems that on AMD processors PTE's accessed bit is
+	 * being set by the CPU hardware before the NPF vmexit.
+	 * This is not expected behaviour and our tests fail because
+	 * of it.
+	 * A workaround here is to disable support for
+	 * GUEST_MAXPHYADDR < HOST_MAXPHYADDR if NPT is enabled.
+	 * In this case userspace can know if there is support using
+	 * KVM_CAP_SMALLER_MAXPHYADDR extension and decide how to handle
+	 * it
+	 * If future AMD CPU models change the behaviour described above,
+	 * this variable can be changed accordingly
+	 */
+	allow_smaller_maxphyaddr = !npt_enabled;
+
 	return 0;
 
 err:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0cebc4832805..8a8e85e6c529 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8294,6 +8294,13 @@ static int __init vmx_init(void)
 #endif
 	vmx_check_vmcs12_offsets();
 
+	/*
+	 * Intel processors don't have problems with
+	 * GUEST_MAXPHYADDR < HOST_MAXPHYADDR so enable
+	 * it for VMX by default
+	 */
+	allow_smaller_maxphyaddr = true;
+
 	return 0;
 }
 module_init(vmx_init);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03c401963062..167becd6a634 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -187,6 +187,9 @@ static struct kvm_shared_msrs __percpu *shared_msrs;
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);
 
+bool __read_mostly allow_smaller_maxphyaddr;
+EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
+
 static u64 __read_mostly host_xss;
 u64 __read_mostly supported_xss;
 EXPORT_SYMBOL_GPL(supported_xss);
@@ -3538,6 +3541,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
 		r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
 		break;
+	case KVM_CAP_SMALLER_MAXPHYADDR:
+		r = (int) allow_smaller_maxphyaddr;
+		break;
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..68cd3a0af9bb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_SMALLER_MAXPHYADDR 184
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap
  2020-07-10 15:48 ` [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap Mohammed Gamal
@ 2020-07-10 16:15   ` Jim Mattson
  0 siblings, 0 replies; 28+ messages in thread
From: Jim Mattson @ 2020-07-10 16:15 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm list, Paolo Bonzini, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 8:48 AM Mohammed Gamal <mgamal@redhat.com> wrote:
>
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> We would like to introduce a callback to update the #PF intercept
> when CPUID changes.  Just reuse update_bp_intercept since VMX is
> already using update_exception_bitmap instead of a bespoke function.
>
> While at it, remove an unnecessary assignment in the SVM version,
> which is already done in the caller (kvm_arch_vcpu_ioctl_set_guest_debug)
> and has nothing to do with the exception bitmap.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes
  2020-07-10 15:48 ` [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes Mohammed Gamal
@ 2020-07-10 16:25   ` Jim Mattson
  0 siblings, 0 replies; 28+ messages in thread
From: Jim Mattson @ 2020-07-10 16:25 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm list, Paolo Bonzini, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 8:48 AM Mohammed Gamal <mgamal@redhat.com> wrote:
>
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> Allow vendor code to observe changes to MAXPHYADDR and start/stop
> intercepting page faults.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (8 preceding siblings ...)
  2020-07-10 15:48 ` [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable Mohammed Gamal
@ 2020-07-10 16:30 ` Jim Mattson
  2020-07-10 17:06   ` Paolo Bonzini
  2020-07-10 17:49 ` Paolo Bonzini
  10 siblings, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2020-07-10 16:30 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm list, Paolo Bonzini, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 8:48 AM Mohammed Gamal <mgamal@redhat.com> wrote:
>
> When EPT is enabled, KVM does not really look at guest physical
> address size. Address bits above maximum physical memory size are reserved.
> Because KVM does not look at these guest physical addresses, it currently
> effectively supports guest physical address sizes equal to the host.
>
> This can be problem when having a mixed setup of machines with 5-level page
> tables and machines with 4-level page tables, as live migration can change
> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.

Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
CPUID information field?

> In this patch series we add checks on guest physical addresses in EPT
> violation/misconfig and NPF vmexits and if needed inject the proper
> page faults in the guest.
>
> A more subtle issue is when the host MAXPHYADDR is larger than that of the
> guest. Page faults caused by reserved bits on the guest won't cause an EPT
> violation/NPF and hence we also check guest MAXPHYADDR and add PFERR_RSVD_MASK
> error code to the page fault if needed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 16:30 ` [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Jim Mattson
@ 2020-07-10 17:06   ` Paolo Bonzini
  2020-07-10 17:13     ` Jim Mattson
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:06 UTC (permalink / raw)
  To: Jim Mattson, Mohammed Gamal
  Cc: kvm list, LKML, Vitaly Kuznetsov, Sean Christopherson,
	Wanpeng Li, Joerg Roedel

On 10/07/20 18:30, Jim Mattson wrote:
>>
>> This can be problem when having a mixed setup of machines with 5-level page
>> tables and machines with 4-level page tables, as live migration can change
>> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
>
> Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
> have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
> CPUID information field?

Changing _host_ MAXPHYADDR while the guest runs, such as if you migrate
from a host-maxphyaddr==46 to a host-maxphyaddr==52 machine (while
keeping guest-maxphyaddr==46).

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 17:06   ` Paolo Bonzini
@ 2020-07-10 17:13     ` Jim Mattson
  2020-07-10 17:16       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2020-07-10 17:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mohammed Gamal, kvm list, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 10:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/07/20 18:30, Jim Mattson wrote:
> >>
> >> This can be problem when having a mixed setup of machines with 5-level page
> >> tables and machines with 4-level page tables, as live migration can change
> >> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
> >
> > Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
> > have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
> > CPUID information field?
>
> Changing _host_ MAXPHYADDR while the guest runs, such as if you migrate
> from a host-maxphyaddr==46 to a host-maxphyaddr==52 machine (while
> keeping guest-maxphyaddr==46).

Ah, but what does that have to do with LA57?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 17:13     ` Jim Mattson
@ 2020-07-10 17:16       ` Paolo Bonzini
  2020-07-10 17:26         ` Sean Christopherson
  2020-07-10 17:26         ` Jim Mattson
  0 siblings, 2 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:16 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Mohammed Gamal, kvm list, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On 10/07/20 19:13, Jim Mattson wrote:
> On Fri, Jul 10, 2020 at 10:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 10/07/20 18:30, Jim Mattson wrote:
>>>>
>>>> This can be problem when having a mixed setup of machines with 5-level page
>>>> tables and machines with 4-level page tables, as live migration can change
>>>> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
>>>
>>> Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
>>> have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
>>> CPUID information field?
>>
>> Changing _host_ MAXPHYADDR while the guest runs, such as if you migrate
>> from a host-maxphyaddr==46 to a host-maxphyaddr==52 machine (while
>> keeping guest-maxphyaddr==46).
> 
> Ah, but what does that have to do with LA57?

Intel only has MAXPHYADDR > 46 on LA57 machines (because in general OSes
like to have a physical 1:1 map into the kernel part of the virtual
address space, so having a higher MAXPHYADDR would be of limited use
with 48-bit linear addresses).

In other words, while this issue has existed forever it could be ignored
until IceLake introduced MAXPHYADDR==52 machines.  I'll introduce
something like this in a commit message.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 17:16       ` Paolo Bonzini
@ 2020-07-10 17:26         ` Sean Christopherson
  2020-07-10 17:26         ` Jim Mattson
  1 sibling, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2020-07-10 17:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, Mohammed Gamal, kvm list, LKML, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 07:16:14PM +0200, Paolo Bonzini wrote:
> On 10/07/20 19:13, Jim Mattson wrote:
> > On Fri, Jul 10, 2020 at 10:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 10/07/20 18:30, Jim Mattson wrote:
> >>>>
> >>>> This can be problem when having a mixed setup of machines with 5-level page
> >>>> tables and machines with 4-level page tables, as live migration can change
> >>>> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
> >>>
> >>> Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
> >>> have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
> >>> CPUID information field?
> >>
> >> Changing _host_ MAXPHYADDR while the guest runs, such as if you migrate
> >> from a host-maxphyaddr==46 to a host-maxphyaddr==52 machine (while
> >> keeping guest-maxphyaddr==46).
> > 
> > Ah, but what does that have to do with LA57?
> 
> Intel only has MAXPHYADDR > 46 on LA57 machines (because in general OSes
> like to have a physical 1:1 map into the kernel part of the virtual
> address space, so having a higher MAXPHYADDR would be of limited use
> with 48-bit linear addresses).
> 
> In other words, while this issue has existed forever it could be ignored
> until IceLake introduced MAXPHYADDR==52 machines.  I'll introduce
> something like this in a commit message.

Yeah, the whole 5-level vs. 4-level thing needs clarification.  Using 5-level
doesn't magically change the host's MAXPA.  But using 5-level vs. 4-level EPT
does change the guest's effective MAXPA.

If the changelog is referring purely to host MAXPA, then just explicitly
state that and don't even mention 5-level vs. 4-level.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 17:16       ` Paolo Bonzini
  2020-07-10 17:26         ` Sean Christopherson
@ 2020-07-10 17:26         ` Jim Mattson
  2020-07-10 17:40           ` Paolo Bonzini
  1 sibling, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2020-07-10 17:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Mohammed Gamal, kvm list, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 10:16 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/07/20 19:13, Jim Mattson wrote:
> > On Fri, Jul 10, 2020 at 10:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>
> >> On 10/07/20 18:30, Jim Mattson wrote:
> >>>>
> >>>> This can be problem when having a mixed setup of machines with 5-level page
> >>>> tables and machines with 4-level page tables, as live migration can change
> >>>> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
> >>>
> >>> Huh? Changing MAXPHYADDR while the guest runs should be illegal. Or
> >>> have I missed some peculiarity of LA57 that makes MAXPHYADDR a dynamic
> >>> CPUID information field?
> >>
> >> Changing _host_ MAXPHYADDR while the guest runs, such as if you migrate
> >> from a host-maxphyaddr==46 to a host-maxphyaddr==52 machine (while
> >> keeping guest-maxphyaddr==46).
> >
> > Ah, but what does that have to do with LA57?
>
> Intel only has MAXPHYADDR > 46 on LA57 machines (because in general OSes
> like to have a physical 1:1 map into the kernel part of the virtual
> address space, so having a higher MAXPHYADDR would be of limited use
> with 48-bit linear addresses).

We all know that the direct map is evil. :-)

Sorry it took me so long to get there. I didn't realize that Linux was
incapable of using more physical memory than it could map into the
kernel's virtual address space. (Wasn't that the whole point of PAE
originally?)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 17:26         ` Jim Mattson
@ 2020-07-10 17:40           ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:40 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Mohammed Gamal, kvm list, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On 10/07/20 19:26, Jim Mattson wrote:
>> Intel only has MAXPHYADDR > 46 on LA57 machines (because in general OSes
>> like to have a physical 1:1 map into the kernel part of the virtual
>> address space, so having a higher MAXPHYADDR would be of limited use
>> with 48-bit linear addresses).
> We all know that the direct map is evil. :-)
> 
> Sorry it took me so long to get there. I didn't realize that Linux was
> incapable of using more physical memory than it could map into the
> kernel's virtual address space. (Wasn't that the whole point of PAE
> originally?)

Yes, but it's so slow that Linux preferred not to go that way for 64-bit
kernels.

That said, that justification for MAXPHYADDR==46 came from Intel
processor architects, and when they say "OSes" they usually refer to a
certain vendor from the Pacific north-west.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable
  2020-07-10 15:48 ` [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable Mohammed Gamal
@ 2020-07-10 17:40   ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:40 UTC (permalink / raw)
  To: Mohammed Gamal, kvm
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli,
	jmattson, joro, Tom Lendacky, Babu Moger

On 10/07/20 17:48, Mohammed Gamal wrote:
> The reason behind including this patch is unexpected behaviour we see
> with NPT vmexit handling in AMD processor.
> 
> With previous patch ("KVM: SVM: Add guest physical address check in
> NPF/PF interception") we see the followning error multiple times in
> the 'access' test in kvm-unit-tests:
> 
>             test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
>             Dump mapping: address: 0x123400000000
>             ------L4: 24c3027
>             ------L3: 24c4027
>             ------L2: 24c5021
>             ------L1: 1002000021
> 
> This shows that the PTE's accessed bit is apparently being set by
> the CPU hardware before the NPF vmexit. This completely handled by
> hardware and can not be fixed in software.
> 
> This patch introduces a workaround. We add a boolean variable:
> 'allow_smaller_maxphyaddr'
> Which is set individually by VMX and SVM init routines. On VMX it's
> always set to true, on SVM it's only set to true when NPT is not
> enabled.
> 
> We also add a new capability KVM_CAP_SMALLER_MAXPHYADDR which
> allows userspace to query if the underlying architecture would
> support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
> (e.g. qemu can decide if it would ignore the -cpu ..,phys-bits=X)
> 
> CC: Tom Lendacky <thomas.lendacky@amd.com>
> CC: Babu Moger <babu.moger@amd.com>
> Signed-off-by: Mohammed Gamal <mgamal@redhat.com>

Slightly rewritten commit message:

    KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
    
    This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
    allows userspace to query if the underlying architecture would
    support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
    (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)
    
    The complications in this patch are due to unexpected (but documented)
    behaviour we see with NPF vmexit handling in AMD processor.  If
    SVM is modified to add guest physical address checks in the NPF
    and guest #PF paths, we see the followning error multiple times in
    the 'access' test in kvm-unit-tests:
    
                test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
                Dump mapping: address: 0x123400000000
                ------L4: 24c3027
                ------L3: 24c4027
                ------L2: 24c5021
                ------L1: 1002000021
    
    This is because the PTE's accessed bit is set by the CPU hardware before
    the NPF vmexit. This is handled completely by hardware and cannot be fixed
    in software.
    
    Therefore, availability of the new capability depends on a boolean variable
    allow_smaller_maxphyaddr which is set individually by VMX and SVM init
    routines. On VMX it's always set to true, on SVM it's only set to true
    when NPT is not enabled.
    
    CC: Tom Lendacky <thomas.lendacky@amd.com>
    CC: Babu Moger <babu.moger@amd.com>
    Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
    Message-Id: <20200710154811.418214-10-mgamal@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa()
  2020-07-10 15:48 ` [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa() Mohammed Gamal
@ 2020-07-10 17:41   ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:41 UTC (permalink / raw)
  To: Mohammed Gamal, kvm
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

On 10/07/20 17:48, Mohammed Gamal wrote:
> In case of running a guest with 4-level page tables on a 5-level page
> table host, it might happen that a guest might have a physical address
> with reserved bits set, but the host won't see that and trap it.
> 
> Hence, we need to check page faults' physical addresses against the guest's
> maximum physical memory and if it's exceeded, we need to add
> the PFERR_RSVD_MASK bits to the PF's error code.
> 
> Also make sure the error code isn't overwritten by the page table walker.
> 

New commit message:


    KVM: x86: mmu: Add guest physical address check in translate_gpa()
    
    Intel processors of various generations have supported 36, 39, 46 or 52
    bits for physical addresses.  Until IceLake introduced MAXPHYADDR==52,
    running on a machine with higher MAXPHYADDR than the guest more or less
    worked, because software that relied on reserved address bits (like KVM)
    generally used bit 51 as a marker and therefore the page faults where
    generated anyway.
    
    Unfortunately this is not true anymore if the host MAXPHYADDR is 52,
    and this can cause problems when migrating from a MAXPHYADDR<52
    machine to one with MAXPHYADDR==52.  Typically, the latter are machines
    that support 5-level page tables, so they can be identified easily from
    the LA57 CPUID bit.
    
    When that happens, the guest might have a physical address with reserved
    bits set, but the host won't see that and trap it.  Hence, we need
    to check page faults' physical addresses against the guest's maximum
    physical memory and if it's exceeded, we need to add the PFERR_RSVD_MASK
    bits to the page fault error code.
    
    This patch does this for the MMU's page walks.  The next patches will
    ensure that the correct exception and error code is produced whenever
    no host-reserved bits are set in page table entries.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR
  2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
                   ` (9 preceding siblings ...)
  2020-07-10 16:30 ` [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Jim Mattson
@ 2020-07-10 17:49 ` Paolo Bonzini
  10 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-07-10 17:49 UTC (permalink / raw)
  To: Mohammed Gamal, kvm
  Cc: linux-kernel, vkuznets, sean.j.christopherson, wanpengli, jmattson, joro

On 10/07/20 17:48, Mohammed Gamal wrote:
> When EPT is enabled, KVM does not really look at guest physical
> address size. Address bits above maximum physical memory size are reserved.
> Because KVM does not look at these guest physical addresses, it currently
> effectively supports guest physical address sizes equal to the host.
> 
> This can be problem when having a mixed setup of machines with 5-level page
> tables and machines with 4-level page tables, as live migration can change
> MAXPHYADDR while the guest runs, which can theoretically introduce bugs.
> 
> In this patch series we add checks on guest physical addresses in EPT
> violation/misconfig and NPF vmexits and if needed inject the proper
> page faults in the guest.
> 
> A more subtle issue is when the host MAXPHYADDR is larger than that of the
> guest. Page faults caused by reserved bits on the guest won't cause an EPT
> violation/NPF and hence we also check guest MAXPHYADDR and add PFERR_RSVD_MASK
> error code to the page fault if needed.
> 
> ----
> 
> Changes from v2:
> - Drop support for this feature on AMD processors after discussion with AMD
> 
> 
> Mohammed Gamal (5):
>   KVM: x86: Add helper functions for illegal GPA checking and page fault
>     injection
>   KVM: x86: mmu: Move translate_gpa() to mmu.c
>   KVM: x86: mmu: Add guest physical address check in translate_gpa()
>   KVM: VMX: Add guest physical address check in EPT violation and
>     misconfig
>   KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
>     configurable
> 
> Paolo Bonzini (4):
>   KVM: x86: rename update_bp_intercept to update_exception_bitmap
>   KVM: x86: update exception bitmap on CPUID changes
>   KVM: VMX: introduce vmx_need_pf_intercept
>   KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
> 
>  arch/x86/include/asm/kvm_host.h | 10 ++------
>  arch/x86/kvm/cpuid.c            |  2 ++
>  arch/x86/kvm/mmu.h              |  6 +++++
>  arch/x86/kvm/mmu/mmu.c          | 12 +++++++++
>  arch/x86/kvm/svm/svm.c          | 22 +++++++++++++---
>  arch/x86/kvm/vmx/nested.c       | 28 ++++++++++++--------
>  arch/x86/kvm/vmx/vmx.c          | 45 +++++++++++++++++++++++++++++----
>  arch/x86/kvm/vmx/vmx.h          |  6 +++++
>  arch/x86/kvm/x86.c              | 29 ++++++++++++++++++++-
>  arch/x86/kvm/x86.h              |  1 +
>  include/uapi/linux/kvm.h        |  1 +
>  11 files changed, 133 insertions(+), 29 deletions(-)
> 

Queued, thanks (I'll look at it more closely when I'm back, but at least
people can play with it).

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
@ 2020-07-13 18:32   ` Sean Christopherson
  2020-07-15 23:00   ` Sean Christopherson
  2020-10-09 16:17   ` Jim Mattson
  2 siblings, 0 replies; 28+ messages in thread
From: Sean Christopherson @ 2020-07-13 18:32 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm, pbonzini, linux-kernel, vkuznets, wanpengli, jmattson, joro

On Fri, Jul 10, 2020 at 05:48:09PM +0200, Mohammed Gamal wrote:
> Check guest physical address against it's maximum physical memory. If
> the guest's physical address exceeds the maximum (i.e. has reserved bits
> set), inject a guest page fault with PFERR_RSVD_MASK set.
> 
> This has to be done both in the EPT violation and page fault paths, as
> there are complications in both cases with respect to the computation
> of the correct error code.
> 
> For EPT violations, unfortunately the only possibility is to emulate,
> because the access type in the exit qualification might refer to an
> access to a paging structure, rather than to the access performed by
> the program.
> 
> Trapping page faults instead is needed in order to correct the error code,
> but the access type can be obtained from the original error code and
> passed to gva_to_gpa.  The corrections required in the error code are
> subtle. For example, imagine that a PTE for a supervisor page has a reserved
> bit set.  On a supervisor-mode access, the EPT violation path would trigger.
> However, on a user-mode access, the processor will not notice the reserved
> bit and not include PFERR_RSVD_MASK in the error code.
> 
> Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
>  arch/x86/kvm/vmx/vmx.h |  3 ++-
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 770b090969fb..de3f436b2d32 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  
>  	if (is_page_fault(intr_info)) {
>  		cr2 = vmx_get_exit_qual(vcpu);
> -		/* EPT won't cause page fault directly */
> -		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
> -		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> +		if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
> +			/*
> +			 * EPT will cause page fault only if we need to
> +			 * detect illegal GPAs.
> +			 */

It'd be nice to retain a WARN_ON_ONCE() here, e.g.

			WARN_ON_ONCE(!vmx_need_pf_intercept(vcpu));

This WARN has fired for me when I've botched the nested VM-Exit routing,
debugging a spurious L2 #PF without would be less than fun.

> +			kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
> +			return 1;
> +		} else
> +			return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
>  	}
>  
>  	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> @@ -5308,6 +5314,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
>  
>  	vcpu->arch.exit_qualification = exit_qualification;
> +
> +	/*
> +	 * Check that the GPA doesn't exceed physical memory limits, as that is
> +	 * a guest page fault.  We have to emulate the instruction here, because
> +	 * if the illegal address is that of a paging structure, then
> +	 * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
> +	 * would also use advanced VM-exit information for EPT violations to
> +	 * reconstruct the page fault error code.
> +	 */
> +	if (unlikely(kvm_mmu_is_illegal_gpa(vcpu, gpa)))
> +		return kvm_emulate_instruction(vcpu, 0);
> +
>  	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
>  }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index b0e5e210f1c1..0d06951e607c 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -11,6 +11,7 @@
>  #include "kvm_cache_regs.h"
>  #include "ops.h"
>  #include "vmcs.h"
> +#include "cpuid.h"
>  
>  extern const u32 vmx_msr_index[];
>  
> @@ -552,7 +553,7 @@ static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>  
>  static inline bool vmx_need_pf_intercept(struct kvm_vcpu *vcpu)
>  {
> -	return !enable_ept;
> +	return !enable_ept || cpuid_maxphyaddr(vcpu) < boot_cpu_data.x86_phys_bits;
>  }
>  
>  void dump_vmcs(void);
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
  2020-07-13 18:32   ` Sean Christopherson
@ 2020-07-15 23:00   ` Sean Christopherson
  2020-08-17 17:22     ` Sean Christopherson
  2020-10-09 16:17   ` Jim Mattson
  2 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2020-07-15 23:00 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm, pbonzini, linux-kernel, vkuznets, wanpengli, jmattson, joro

On Fri, Jul 10, 2020 at 05:48:09PM +0200, Mohammed Gamal wrote:
> Check guest physical address against it's maximum physical memory. If
> the guest's physical address exceeds the maximum (i.e. has reserved bits
> set), inject a guest page fault with PFERR_RSVD_MASK set.
> 
> This has to be done both in the EPT violation and page fault paths, as
> there are complications in both cases with respect to the computation
> of the correct error code.
> 
> For EPT violations, unfortunately the only possibility is to emulate,
> because the access type in the exit qualification might refer to an
> access to a paging structure, rather than to the access performed by
> the program.
> 
> Trapping page faults instead is needed in order to correct the error code,
> but the access type can be obtained from the original error code and
> passed to gva_to_gpa.  The corrections required in the error code are
> subtle. For example, imagine that a PTE for a supervisor page has a reserved
> bit set.  On a supervisor-mode access, the EPT violation path would trigger.
> However, on a user-mode access, the processor will not notice the reserved
> bit and not include PFERR_RSVD_MASK in the error code.
> 
> Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
>  arch/x86/kvm/vmx/vmx.h |  3 ++-
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 770b090969fb..de3f436b2d32 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  
>  	if (is_page_fault(intr_info)) {
>  		cr2 = vmx_get_exit_qual(vcpu);
> -		/* EPT won't cause page fault directly */
> -		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
> -		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> +		if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
> +			/*
> +			 * EPT will cause page fault only if we need to
> +			 * detect illegal GPAs.
> +			 */
> +			kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);

This splats when running the PKU unit test, although the test still passed.
I haven't yet spent the brain power to determine if this is a benign warning,
i.e. simply unexpected, or if permission_fault() fault truly can't handle PK
faults.

  WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
  Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
  RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
  Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
  RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
  RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
  RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
  R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
  R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
  FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   paging64_gva_to_gpa+0x3f/0xb0 [kvm]
   kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
   handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
   kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
   ksys_ioctl+0x92/0xb0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x3e/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace d17eb998aee991da ]---


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-07-15 23:00   ` Sean Christopherson
@ 2020-08-17 17:22     ` Sean Christopherson
  2020-08-17 18:01       ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Sean Christopherson @ 2020-08-17 17:22 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm, pbonzini, linux-kernel, vkuznets, wanpengli, jmattson, joro

On Wed, Jul 15, 2020 at 04:00:08PM -0700, Sean Christopherson wrote:
> On Fri, Jul 10, 2020 at 05:48:09PM +0200, Mohammed Gamal wrote:
> > Check guest physical address against it's maximum physical memory. If
> > the guest's physical address exceeds the maximum (i.e. has reserved bits
> > set), inject a guest page fault with PFERR_RSVD_MASK set.
> > 
> > This has to be done both in the EPT violation and page fault paths, as
> > there are complications in both cases with respect to the computation
> > of the correct error code.
> > 
> > For EPT violations, unfortunately the only possibility is to emulate,
> > because the access type in the exit qualification might refer to an
> > access to a paging structure, rather than to the access performed by
> > the program.
> > 
> > Trapping page faults instead is needed in order to correct the error code,
> > but the access type can be obtained from the original error code and
> > passed to gva_to_gpa.  The corrections required in the error code are
> > subtle. For example, imagine that a PTE for a supervisor page has a reserved
> > bit set.  On a supervisor-mode access, the EPT violation path would trigger.
> > However, on a user-mode access, the processor will not notice the reserved
> > bit and not include PFERR_RSVD_MASK in the error code.
> > 
> > Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
> >  arch/x86/kvm/vmx/vmx.h |  3 ++-
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 770b090969fb..de3f436b2d32 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  
> >  	if (is_page_fault(intr_info)) {
> >  		cr2 = vmx_get_exit_qual(vcpu);
> > -		/* EPT won't cause page fault directly */
> > -		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
> > -		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> > +		if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
> > +			/*
> > +			 * EPT will cause page fault only if we need to
> > +			 * detect illegal GPAs.
> > +			 */
> > +			kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
> 
> This splats when running the PKU unit test, although the test still passed.
> I haven't yet spent the brain power to determine if this is a benign warning,
> i.e. simply unexpected, or if permission_fault() fault truly can't handle PK
> faults.
> 
>   WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
>   Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
>   RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
>   Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
>   RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
>   RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
>   RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
>   RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
>   R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
>   FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   PKRU: 55555554
>   Call Trace:
>    paging64_gva_to_gpa+0x3f/0xb0 [kvm]
>    kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
>    handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
>    kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
>    kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
>    ksys_ioctl+0x92/0xb0
>    __x64_sys_ioctl+0x16/0x20
>    do_syscall_64+0x3e/0xb0
>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   ---[ end trace d17eb998aee991da ]---

Looks like this series got pulled for 5.9, has anyone looked into this?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-08-17 17:22     ` Sean Christopherson
@ 2020-08-17 18:01       ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2020-08-17 18:01 UTC (permalink / raw)
  To: Sean Christopherson, Mohammed Gamal
  Cc: kvm, linux-kernel, vkuznets, wanpengli, jmattson, joro

On 17/08/20 19:22, Sean Christopherson wrote:
>> This splats when running the PKU unit test, although the test still passed.
>> I haven't yet spent the brain power to determine if this is a benign warning,
>> i.e. simply unexpected, or if permission_fault() fault truly can't handle PK
>> faults.

It's more or less unexpected; the error is in the caller.  This is not
an error code but an access mask so only U/F/W bits are valid.  Patch
sent, thanks.

Paolo

>>   WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
>>   Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
>>   RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
>>   Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
>>   RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
>>   RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
>>   RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
>>   RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
>>   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
>>   R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
>>   FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   PKRU: 55555554
>>   Call Trace:
>>    paging64_gva_to_gpa+0x3f/0xb0 [kvm]
>>    kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
>>    handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
>>    kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
>>    kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
>>    ksys_ioctl+0x92/0xb0
>>    __x64_sys_ioctl+0x16/0x20
>>    do_syscall_64+0x3e/0xb0
>>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>   ---[ end trace d17eb998aee991da ]---
> 
> Looks like this series got pulled for 5.9, has anyone looked into this?
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
  2020-07-13 18:32   ` Sean Christopherson
  2020-07-15 23:00   ` Sean Christopherson
@ 2020-10-09 16:17   ` Jim Mattson
  2020-10-14 23:44     ` Jim Mattson
  2 siblings, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2020-10-09 16:17 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm list, Paolo Bonzini, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Jul 10, 2020 at 8:48 AM Mohammed Gamal <mgamal@redhat.com> wrote:
>
> Check guest physical address against it's maximum physical memory. If
> the guest's physical address exceeds the maximum (i.e. has reserved bits
> set), inject a guest page fault with PFERR_RSVD_MASK set.
>
> This has to be done both in the EPT violation and page fault paths, as
> there are complications in both cases with respect to the computation
> of the correct error code.
>
> For EPT violations, unfortunately the only possibility is to emulate,
> because the access type in the exit qualification might refer to an
> access to a paging structure, rather than to the access performed by
> the program.
>
> Trapping page faults instead is needed in order to correct the error code,
> but the access type can be obtained from the original error code and
> passed to gva_to_gpa.  The corrections required in the error code are
> subtle. For example, imagine that a PTE for a supervisor page has a reserved
> bit set.  On a supervisor-mode access, the EPT violation path would trigger.
> However, on a user-mode access, the processor will not notice the reserved
> bit and not include PFERR_RSVD_MASK in the error code.
>
> Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
>  arch/x86/kvm/vmx/vmx.h |  3 ++-
>  2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 770b090969fb..de3f436b2d32 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>
>         if (is_page_fault(intr_info)) {
>                 cr2 = vmx_get_exit_qual(vcpu);
> -               /* EPT won't cause page fault directly */
> -               WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
> -               return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> +               if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
> +                       /*
> +                        * EPT will cause page fault only if we need to
> +                        * detect illegal GPAs.
> +                        */
> +                       kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
> +                       return 1;
> +               } else
> +                       return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
>         }
>
>         ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> @@ -5308,6 +5314,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>                PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
>
>         vcpu->arch.exit_qualification = exit_qualification;
> +
> +       /*
> +        * Check that the GPA doesn't exceed physical memory limits, as that is
> +        * a guest page fault.  We have to emulate the instruction here, because
> +        * if the illegal address is that of a paging structure, then
> +        * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
> +        * would also use advanced VM-exit information for EPT violations to
> +        * reconstruct the page fault error code.
> +        */
> +       if (unlikely(kvm_mmu_is_illegal_gpa(vcpu, gpa)))
> +               return kvm_emulate_instruction(vcpu, 0);
> +

Is kvm's in-kernel emulator up to the task? What if the instruction in
question is AVX-512, or one of the myriad instructions that the
in-kernel emulator can't handle? Ice Lake must support the advanced
VM-exit information for EPT violations, so that would seem like a
better choice.

>         return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
>  }
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index b0e5e210f1c1..0d06951e607c 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -11,6 +11,7 @@
>  #include "kvm_cache_regs.h"
>  #include "ops.h"
>  #include "vmcs.h"
> +#include "cpuid.h"
>
>  extern const u32 vmx_msr_index[];
>
> @@ -552,7 +553,7 @@ static inline bool vmx_has_waitpkg(struct vcpu_vmx *vmx)
>
>  static inline bool vmx_need_pf_intercept(struct kvm_vcpu *vcpu)
>  {
> -       return !enable_ept;
> +       return !enable_ept || cpuid_maxphyaddr(vcpu) < boot_cpu_data.x86_phys_bits;
>  }
>
>  void dump_vmcs(void);
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig
  2020-10-09 16:17   ` Jim Mattson
@ 2020-10-14 23:44     ` Jim Mattson
  0 siblings, 0 replies; 28+ messages in thread
From: Jim Mattson @ 2020-10-14 23:44 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: kvm list, Paolo Bonzini, LKML, Vitaly Kuznetsov,
	Sean Christopherson, Wanpeng Li, Joerg Roedel

On Fri, Oct 9, 2020 at 9:17 AM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Jul 10, 2020 at 8:48 AM Mohammed Gamal <mgamal@redhat.com> wrote:
> >
> > Check guest physical address against it's maximum physical memory. If
> > the guest's physical address exceeds the maximum (i.e. has reserved bits
> > set), inject a guest page fault with PFERR_RSVD_MASK set.
> >
> > This has to be done both in the EPT violation and page fault paths, as
> > there are complications in both cases with respect to the computation
> > of the correct error code.
> >
> > For EPT violations, unfortunately the only possibility is to emulate,
> > because the access type in the exit qualification might refer to an
> > access to a paging structure, rather than to the access performed by
> > the program.
> >
> > Trapping page faults instead is needed in order to correct the error code,
> > but the access type can be obtained from the original error code and
> > passed to gva_to_gpa.  The corrections required in the error code are
> > subtle. For example, imagine that a PTE for a supervisor page has a reserved
> > bit set.  On a supervisor-mode access, the EPT violation path would trigger.
> > However, on a user-mode access, the processor will not notice the reserved
> > bit and not include PFERR_RSVD_MASK in the error code.
> >
> > Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 24 +++++++++++++++++++++---
> >  arch/x86/kvm/vmx/vmx.h |  3 ++-
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 770b090969fb..de3f436b2d32 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4790,9 +4790,15 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >
> >         if (is_page_fault(intr_info)) {
> >                 cr2 = vmx_get_exit_qual(vcpu);
> > -               /* EPT won't cause page fault directly */
> > -               WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
> > -               return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> > +               if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
> > +                       /*
> > +                        * EPT will cause page fault only if we need to
> > +                        * detect illegal GPAs.
> > +                        */
> > +                       kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
> > +                       return 1;
> > +               } else
> > +                       return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
> >         }
> >
> >         ex_no = intr_info & INTR_INFO_VECTOR_MASK;
> > @@ -5308,6 +5314,18 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> >                PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
> >
> >         vcpu->arch.exit_qualification = exit_qualification;
> > +
> > +       /*
> > +        * Check that the GPA doesn't exceed physical memory limits, as that is
> > +        * a guest page fault.  We have to emulate the instruction here, because
> > +        * if the illegal address is that of a paging structure, then
> > +        * EPT_VIOLATION_ACC_WRITE bit is set.  Alternatively, if supported we
> > +        * would also use advanced VM-exit information for EPT violations to
> > +        * reconstruct the page fault error code.
> > +        */
> > +       if (unlikely(kvm_mmu_is_illegal_gpa(vcpu, gpa)))
> > +               return kvm_emulate_instruction(vcpu, 0);
> > +
>
> Is kvm's in-kernel emulator up to the task? What if the instruction in
> question is AVX-512, or one of the myriad instructions that the
> in-kernel emulator can't handle? Ice Lake must support the advanced
> VM-exit information for EPT violations, so that would seem like a
> better choice.
>
Anyone?

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 15:48 [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Mohammed Gamal
2020-07-10 15:48 ` [PATCH v3 1/9] KVM: x86: Add helper functions for illegal GPA checking and page fault injection Mohammed Gamal
2020-07-10 15:48 ` [PATCH v3 2/9] KVM: x86: mmu: Move translate_gpa() to mmu.c Mohammed Gamal
2020-07-10 15:48 ` [PATCH v3 3/9] KVM: x86: mmu: Add guest physical address check in translate_gpa() Mohammed Gamal
2020-07-10 17:41   ` Paolo Bonzini
2020-07-10 15:48 ` [PATCH v3 4/9] KVM: x86: rename update_bp_intercept to update_exception_bitmap Mohammed Gamal
2020-07-10 16:15   ` Jim Mattson
2020-07-10 15:48 ` [PATCH v3 5/9] KVM: x86: update exception bitmap on CPUID changes Mohammed Gamal
2020-07-10 16:25   ` Jim Mattson
2020-07-10 15:48 ` [PATCH v3 6/9] KVM: VMX: introduce vmx_need_pf_intercept Mohammed Gamal
2020-07-10 15:48 ` [PATCH v3 7/9] KVM: VMX: Add guest physical address check in EPT violation and misconfig Mohammed Gamal
2020-07-13 18:32   ` Sean Christopherson
2020-07-15 23:00   ` Sean Christopherson
2020-08-17 17:22     ` Sean Christopherson
2020-08-17 18:01       ` Paolo Bonzini
2020-10-09 16:17   ` Jim Mattson
2020-10-14 23:44     ` Jim Mattson
2020-07-10 15:48 ` [PATCH v3 8/9] KVM: VMX: optimize #PF injection when MAXPHYADDR does not match Mohammed Gamal
2020-07-10 15:48 ` [PATCH v3 9/9] KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable Mohammed Gamal
2020-07-10 17:40   ` Paolo Bonzini
2020-07-10 16:30 ` [PATCH v3 0/9] KVM: Support guest MAXPHYADDR < host MAXPHYADDR Jim Mattson
2020-07-10 17:06   ` Paolo Bonzini
2020-07-10 17:13     ` Jim Mattson
2020-07-10 17:16       ` Paolo Bonzini
2020-07-10 17:26         ` Sean Christopherson
2020-07-10 17:26         ` Jim Mattson
2020-07-10 17:40           ` Paolo Bonzini
2020-07-10 17:49 ` Paolo Bonzini

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git