kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs
@ 2021-02-25 20:47 Sean Christopherson
  2021-02-25 20:47 ` [PATCH 01/24] KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled Sean Christopherson
                   ` (24 more replies)
  0 siblings, 25 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

This series adds the simple idea of tagging shadow-present SPTEs with
a single bit, instead of looking for non-zero SPTEs that aren't MMIO and
aren't REMOVED.  Doing so reduces KVM's code footprint by 2k bytes on
x86-64, and presumably adds a tiny performance boost in related paths.

But, actually adding MMU_PRESENT without breaking one flow or another is
a bit of a debacle.  The main issue is that EPT doesn't have many low
available bits, and PAE doesn't have any high available bits.  And, the
existing MMU_WRITABLE and HOST_WRITABLE flags aren't optional, i.e. are
needed for all flavors of paging.  The solution I settled on is to let
make the *_WRITABLE bit configurable so that EPT can use high available
bits.

Of course, I forgot the above PAE restriction multiple times, and
journeyed down several dead ends.  The most notable failed idea was
using the AD_* masks in bits 52 and 53 to denote shadow-present SPTEs.
That would have been quite clever as it would provide the same benefits
without burning another available bit.

Along the many failed attempts, I collected a variety of bug fixes and
cleanups, mostly things found by inspection after doing a deep dive to
figure out what I broke.

Sean Christopherson (24):
  KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is
    enabled
  KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status
  KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not
    shadow-present
  KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF
  KVM: x86/mmu: Retry page faults that hit an invalid memslot
  KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is
    disabled
  KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()
  KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU
  KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers
  KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs
  KVM: x86/mmu: Add module param to disable MMIO caching (for testing)
  KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs
  KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation
  KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits
  KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU
    proper
  KVM: x86/mmu: Co-locate code for setting various SPTE masks
  KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU
    proper
  KVM: x86/mmu: Make Host-writable and MMU-writable bit locations
    dynamic
  KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs
  KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)
  KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents
  KVM: x86/mmu: Use low available bits for removed SPTEs
  KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE

 Documentation/virt/kvm/locking.rst |  49 +++++----
 arch/x86/include/asm/kvm_host.h    |   3 -
 arch/x86/kvm/mmu.h                 |  15 +--
 arch/x86/kvm/mmu/mmu.c             |  87 +++++++---------
 arch/x86/kvm/mmu/mmu_internal.h    |  16 +--
 arch/x86/kvm/mmu/paging_tmpl.h     |   2 +-
 arch/x86/kvm/mmu/spte.c            | 157 ++++++++++++++++++++---------
 arch/x86/kvm/mmu/spte.h            | 135 +++++++++++++++++--------
 arch/x86/kvm/mmu/tdp_mmu.c         |  22 ++--
 arch/x86/kvm/svm/svm.c             |   2 +-
 arch/x86/kvm/vmx/vmx.c             |  24 +----
 arch/x86/kvm/x86.c                 |   3 -
 12 files changed, 290 insertions(+), 225 deletions(-)

-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/24] KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 02/24] KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status Sean Christopherson
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Check that PML is actually enabled before setting the mask to force a
SPTE to be write-protected.  The bits used for the !AD_ENABLED case are
in the upper half of the SPTE.  With 64-bit paging and EPT, these bits
are ignored, but with 32-bit PAE paging they are reserved.  Setting them
for L2 SPTEs without checking PML breaks NPT on 32-bit KVM.

Fixes: 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration with PML")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 72b0928f2b2d..ec4fc28b325a 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -81,15 +81,15 @@ static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
 static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * When using the EPT page-modification log, the GPAs in the log
-	 * would come from L2 rather than L1.  Therefore, we need to rely
-	 * on write protection to record dirty pages.  This also bypasses
-	 * PML, since writes now result in a vmexit.  Note, this helper will
-	 * tag SPTEs as needing write-protection even if PML is disabled or
-	 * unsupported, but that's ok because the tag is consumed if and only
-	 * if PML is enabled.  Omit the PML check to save a few uops.
+	 * When using the EPT page-modification log, the GPAs in the CPU dirty
+	 * log would come from L2 rather than L1.  Therefore, we need to rely
+	 * on write protection to record dirty pages, which bypasses PML, since
+	 * writes now result in a vmexit.  Note, the check on CPU dirty logging
+	 * being enabled is mandatory as the bits used to denote WP-only SPTEs
+	 * are reserved for NPT w/ PAE (32-bit KVM).
 	 */
-	return vcpu->arch.mmu == &vcpu->arch.guest_mmu;
+	return vcpu->arch.mmu == &vcpu->arch.guest_mmu &&
+	       kvm_x86_ops.cpu_dirty_log_size;
 }
 
 bool is_nx_huge_page_enabled(void);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 02/24] KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
  2021-02-25 20:47 ` [PATCH 01/24] KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 03/24] KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present Sean Christopherson
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

When updating accessed and dirty bits, check that the new SPTE is present
before attempting to query its A/D bits.  Failure to confirm the SPTE is
present can theoretically cause a false negative, e.g. if a MMIO SPTE
replaces a "real" SPTE and somehow the PFNs magically match.

Realistically, this is all but guaranteed to be a benign bug.  Fix it up
primarily so that a future patch can tweak the MMU_WARN_ON checking A/D
status to fire if the SPTE is not-present.

Fixes: f8e144971c68 ("kvm: x86/mmu: Add access tracking for tdp_mmu")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c926c6b899a1..f46972892a2d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -210,13 +210,12 @@ static int kvm_mmu_page_as_id(struct kvm_mmu_page *sp)
 
 static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
 {
-	bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-
 	if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
 		return;
 
 	if (is_accessed_spte(old_spte) &&
-	    (!is_accessed_spte(new_spte) || pfn_changed))
+	    (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
+	     spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
 		kvm_set_pfn_accessed(spte_to_pfn(old_spte));
 }
 
@@ -444,7 +443,7 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 
 
 	if (was_leaf && is_dirty_spte(old_spte) &&
-	    (!is_dirty_spte(new_spte) || pfn_changed))
+	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
 		kvm_set_pfn_dirty(spte_to_pfn(old_spte));
 
 	/*
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 03/24] KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
  2021-02-25 20:47 ` [PATCH 01/24] KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled Sean Christopherson
  2021-02-25 20:47 ` [PATCH 02/24] KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 04/24] KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF Sean Christopherson
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Bail from fast_page_fault() if the SPTE is not a shadow-present SPTE.
Functionally, this is not strictly necessary as the !is_access_allowed()
check will eventually reject the fast path, but an early check on
shadow-present skips unnecessary checks and will allow a future patch to
tweak the A/D status auditing to warn if KVM attempts to query A/D bits
without first ensuring the SPTE is a shadow-present SPTE.

Note, is_shadow_present_pte() is quite expensive at this time, i.e. this
might be a net negative in the short term.  A future patch will optimize
is_shadow_present_pte() to a single AND operation and remedy the issue.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d75524bc8423..93b0285e8b38 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3061,6 +3061,9 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			if (!is_shadow_present_pte(spte))
 				break;
 
+		if (!is_shadow_present_pte(spte))
+			break;
+
 		sp = sptep_to_sp(iterator.sptep);
 		if (!is_last_spte(spte, sp->role.level))
 			break;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 04/24] KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 03/24] KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 05/24] KVM: x86/mmu: Retry page faults that hit an invalid memslot Sean Christopherson
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Disable MMIO caching if the MMIO value collides with the L1TF mitigation
that usurps high PFN bits.  In practice this should never happen as only
CPUs with SME support can generate such a collision (because the MMIO
value can theoretically get adjusted into legal memory), and no CPUs
exist that support SME and are susceptible to L1TF.  But, closing the
hole is trivial.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index ef55f0bc4ccf..9ea097bcb491 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -245,8 +245,19 @@ u64 mark_spte_for_access_track(u64 spte)
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask)
 {
 	BUG_ON((u64)(unsigned)access_mask != access_mask);
-	WARN_ON(mmio_value & (shadow_nonpresent_or_rsvd_mask << SHADOW_NONPRESENT_OR_RSVD_MASK_LEN));
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
+
+	/*
+	 * Disable MMIO caching if the MMIO value collides with the bits that
+	 * are used to hold the relocated GFN when the L1TF mitigation is
+	 * enabled.  This should never fire as there is no known hardware that
+	 * can trigger this condition, e.g. SME/SEV CPUs that require a custom
+	 * MMIO value are not susceptible to L1TF.
+	 */
+	if (WARN_ON(mmio_value & (shadow_nonpresent_or_rsvd_mask <<
+				  SHADOW_NONPRESENT_OR_RSVD_MASK_LEN)))
+		mmio_value = 0;
+
 	shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
 	shadow_mmio_access_mask = access_mask;
 }
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 05/24] KVM: x86/mmu: Retry page faults that hit an invalid memslot
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 04/24] KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 06/24] KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled Sean Christopherson
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Retry page faults (re-enter the guest) that hit an invalid memslot
instead of treating the memslot as not existing, i.e. handling the
page fault as an MMIO access.  When deleting a memslot, SPTEs aren't
zapped and the TLBs aren't flushed until after the memslot has been
marked invalid.

Handling the invalid slot as MMIO means there's a small window where a
page fault could replace a valid SPTE with an MMIO SPTE.  The legacy
MMU handles such a scenario cleanly, but the TDP MMU assumes such
behavior is impossible (see the BUG() in __handle_changed_spte()).
There's really no good reason why the legacy MMU should allow such a
scenario, and closing this hole allows for additional cleanups.

Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 93b0285e8b38..9eb5ccb66e31 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3656,6 +3656,14 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
 
+	/*
+	 * Retry the page fault if the gfn hit a memslot that is being deleted
+	 * or moved.  This ensures any existing SPTEs for the old memslot will
+	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
+	 */
+	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
+		return true;
+
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
 		*pfn = KVM_PFN_NOSLOT;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 06/24] KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 05/24] KVM: x86/mmu: Retry page faults that hit an invalid memslot Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 07/24] KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte() Sean Christopherson
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

If MMIO caching is disabled, e.g. when using shadow paging on CPUs with
52 bits of PA space, go straight to MMIO emulation and don't install an
MMIO SPTE.  The SPTE will just generate a !PRESENT #PF, i.e. can't
actually accelerate future MMIO.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c  | 12 +++++++++++-
 arch/x86/kvm/mmu/spte.c |  7 ++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9eb5ccb66e31..37c68abc54b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2946,9 +2946,19 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
 		return true;
 	}
 
-	if (unlikely(is_noslot_pfn(pfn)))
+	if (unlikely(is_noslot_pfn(pfn))) {
 		vcpu_cache_mmio_info(vcpu, gva, gfn,
 				     access & shadow_mmio_access_mask);
+		/*
+		 * If MMIO caching is disabled, emulate immediately without
+		 * touching the shadow page tables as attempting to install an
+		 * MMIO SPTE will just be an expensive nop.
+		 */
+		if (unlikely(!shadow_mmio_value)) {
+			*ret_val = RET_PF_EMULATE;
+			return true;
+		}
+	}
 
 	return false;
 }
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 9ea097bcb491..dcba9c1cbe29 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -51,6 +51,8 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
 	u64 mask = generation_mmio_spte_mask(gen);
 	u64 gpa = gfn << PAGE_SHIFT;
 
+	WARN_ON_ONCE(!shadow_mmio_value);
+
 	access &= shadow_mmio_access_mask;
 	mask |= shadow_mmio_value | access;
 	mask |= gpa | shadow_nonpresent_or_rsvd_mask;
@@ -258,7 +260,10 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask)
 				  SHADOW_NONPRESENT_OR_RSVD_MASK_LEN)))
 		mmio_value = 0;
 
-	shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
+	if (mmio_value)
+		shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
+	else
+		shadow_mmio_value = 0;
 	shadow_mmio_access_mask = access_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 07/24] KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 06/24] KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 08/24] KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU Sean Christopherson
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Now that it should be impossible to convert a valid SPTE to an MMIO SPTE,
handle MMIO SPTEs early in mmu_set_spte() without going through
set_spte() and all the logic for removing an existing, valid SPTE.
The other caller of set_spte(), FNAME(sync_page)(), explicitly handles
MMIO SPTEs prior to calling set_spte().

This simplifies mmu_set_spte() and set_spte(), and also "fixes" an oddity
where MMIO SPTEs are traced by both trace_kvm_mmu_set_spte() and
trace_mark_mmio_spte().

Note, mmu_spte_set() will WARN if this new approach causes KVM to create
an MMIO SPTE overtop a valid SPTE.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 37c68abc54b8..4a24beefff94 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -236,17 +236,6 @@ static unsigned get_mmio_spte_access(u64 spte)
 	return spte & shadow_mmio_access_mask;
 }
 
-static bool set_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
-			  kvm_pfn_t pfn, unsigned int access)
-{
-	if (unlikely(is_noslot_pfn(pfn))) {
-		mark_mmio_spte(vcpu, sptep, gfn, access);
-		return true;
-	}
-
-	return false;
-}
-
 static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 {
 	u64 kvm_gen, spte_gen, gen;
@@ -2561,9 +2550,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	struct kvm_mmu_page *sp;
 	int ret;
 
-	if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
-		return 0;
-
 	sp = sptep_to_sp(sptep);
 
 	ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
@@ -2593,6 +2579,11 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
 
+	if (unlikely(is_noslot_pfn(pfn))) {
+		mark_mmio_spte(vcpu, sptep, gfn, pte_access);
+		return RET_PF_EMULATE;
+	}
+
 	if (is_shadow_present_pte(*sptep)) {
 		/*
 		 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
@@ -2626,9 +2617,6 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
 				KVM_PAGES_PER_HPAGE(level));
 
-	if (unlikely(is_mmio_spte(*sptep)))
-		ret = RET_PF_EMULATE;
-
 	/*
 	 * The fault is fully spurious if and only if the new SPTE and old SPTE
 	 * are identical, and emulation is not required.
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 08/24] KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 07/24] KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte() Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 09/24] KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers Sean Christopherson
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Remove TDP MMU's call to trace_kvm_mmu_set_spte() that is done for both
shadow-present SPTEs and MMIO SPTEs.  It's fully redundant for the
former, and unnecessary for the latter.  This aligns TDP MMU tracing
behavior with that of the legacy MMU.

Fixes: 33dd3574f5fe ("kvm: x86/mmu: Add existing trace points to TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f46972892a2d..782cae1eb5e1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -773,12 +773,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
 		trace_mark_mmio_spte(rcu_dereference(iter->sptep), iter->gfn,
 				     new_spte);
 		ret = RET_PF_EMULATE;
-	} else
+	} else {
 		trace_kvm_mmu_set_spte(iter->level, iter->gfn,
 				       rcu_dereference(iter->sptep));
+	}
 
-	trace_kvm_mmu_set_spte(iter->level, iter->gfn,
-			       rcu_dereference(iter->sptep));
 	if (!prefault)
 		vcpu->stat.pf_fixed++;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 09/24] KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 08/24] KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 10/24] KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs Sean Christopherson
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

The value returned by make_mmio_spte() is a SPTE, it is not a mask.
Name it accordingly.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c  |  6 +++---
 arch/x86/kvm/mmu/spte.c | 10 +++++-----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a24beefff94..ced412f90b7d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -215,10 +215,10 @@ bool is_nx_huge_page_enabled(void)
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
 			   unsigned int access)
 {
-	u64 mask = make_mmio_spte(vcpu, gfn, access);
+	u64 spte = make_mmio_spte(vcpu, gfn, access);
 
-	trace_mark_mmio_spte(sptep, gfn, mask);
-	mmu_spte_set(sptep, mask);
+	trace_mark_mmio_spte(sptep, gfn, spte);
+	mmu_spte_set(sptep, spte);
 }
 
 static gfn_t get_mmio_spte_gfn(u64 spte)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index dcba9c1cbe29..e4ef3267f9ac 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -48,18 +48,18 @@ static u64 generation_mmio_spte_mask(u64 gen)
 u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access)
 {
 	u64 gen = kvm_vcpu_memslots(vcpu)->generation & MMIO_SPTE_GEN_MASK;
-	u64 mask = generation_mmio_spte_mask(gen);
+	u64 spte = generation_mmio_spte_mask(gen);
 	u64 gpa = gfn << PAGE_SHIFT;
 
 	WARN_ON_ONCE(!shadow_mmio_value);
 
 	access &= shadow_mmio_access_mask;
-	mask |= shadow_mmio_value | access;
-	mask |= gpa | shadow_nonpresent_or_rsvd_mask;
-	mask |= (gpa & shadow_nonpresent_or_rsvd_mask)
+	spte |= shadow_mmio_value | access;
+	spte |= gpa | shadow_nonpresent_or_rsvd_mask;
+	spte |= (gpa & shadow_nonpresent_or_rsvd_mask)
 		<< SHADOW_NONPRESENT_OR_RSVD_MASK_LEN;
 
-	return mask;
+	return spte;
 }
 
 static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 10/24] KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 09/24] KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 11/24] KVM: x86/mmu: Add module param to disable MMIO caching (for testing) Sean Christopherson
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Stop tagging MMIO SPTEs with specific available bits and instead detect
MMIO SPTEs by checking for their unique SPTE value.  The value is
guaranteed to be unique on shadow paging and NPT as setting reserved
physical address bits on any other type of SPTE would consistute a KVM
bug.  Ditto for EPT, as creating a WX non-MMIO would also be a bug.

Note, this approach is also future-compatibile with TDX, which will need
to reflect MMIO EPT violations as #VEs into the guest.  To create an EPT
violation instead of a misconfig, TDX EPTs will need to have RWX=0,  But,
MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so
MMIO SPTEs will still be guaranteed to have a unique value within a given
MMU context.

The main motivation is to make it easier to reason about which types of
SPTEs use which available bits.  As a happy side effect, this frees up
two more bits for storing the MMIO generation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu.h      |  2 +-
 arch/x86/kvm/mmu/mmu.c  |  2 +-
 arch/x86/kvm/mmu/spte.c | 11 ++++++-----
 arch/x86/kvm/mmu/spte.h | 10 ++++------
 arch/x86/kvm/svm/svm.c  |  2 +-
 arch/x86/kvm/vmx/vmx.c  |  3 ++-
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index c68bfc3e2402..00f4a541e04d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -59,7 +59,7 @@ static __always_inline u64 rsvd_bits(int s, int e)
 	return ((2ULL << (e - s)) - 1) << s;
 }
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask);
+void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ced412f90b7d..f92571b786a2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5726,7 +5726,7 @@ static void kvm_set_mmio_spte_mask(void)
 	else
 		mask = 0;
 
-	kvm_mmu_set_mmio_spte_mask(mask, ACC_WRITE_MASK | ACC_USER_MASK);
+	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
 static bool get_nx_auto_mode(void)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index e4ef3267f9ac..b2379094a8c1 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -23,6 +23,7 @@ u64 __read_mostly shadow_user_mask;
 u64 __read_mostly shadow_accessed_mask;
 u64 __read_mostly shadow_dirty_mask;
 u64 __read_mostly shadow_mmio_value;
+u64 __read_mostly shadow_mmio_mask;
 u64 __read_mostly shadow_mmio_access_mask;
 u64 __read_mostly shadow_present_mask;
 u64 __read_mostly shadow_me_mask;
@@ -163,6 +164,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		spte = mark_spte_for_access_track(spte);
 
 out:
+	WARN_ON(is_mmio_spte(spte));
 	*new_spte = spte;
 	return ret;
 }
@@ -244,7 +246,7 @@ u64 mark_spte_for_access_track(u64 spte)
 	return spte;
 }
 
-void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask)
+void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 {
 	BUG_ON((u64)(unsigned)access_mask != access_mask);
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
@@ -260,10 +262,9 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 access_mask)
 				  SHADOW_NONPRESENT_OR_RSVD_MASK_LEN)))
 		mmio_value = 0;
 
-	if (mmio_value)
-		shadow_mmio_value = mmio_value | SPTE_MMIO_MASK;
-	else
-		shadow_mmio_value = 0;
+	WARN_ON((mmio_value & mmio_mask) != mmio_value);
+	shadow_mmio_value = mmio_value;
+	shadow_mmio_mask  = mmio_mask;
 	shadow_mmio_access_mask = access_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 6de3950fd704..642a17b9964c 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -8,15 +8,11 @@
 #define PT_FIRST_AVAIL_BITS_SHIFT 10
 #define PT64_SECOND_AVAIL_BITS_SHIFT 54
 
-/*
- * The mask used to denote special SPTEs, which can be either MMIO SPTEs or
- * Access Tracking SPTEs.
- */
+/* The mask used to denote Access Tracking SPTEs.  Note, val=3 is available. */
 #define SPTE_SPECIAL_MASK (3ULL << 52)
 #define SPTE_AD_ENABLED_MASK (0ULL << 52)
 #define SPTE_AD_DISABLED_MASK (1ULL << 52)
 #define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52)
-#define SPTE_MMIO_MASK (3ULL << 52)
 
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 #define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
@@ -98,6 +94,7 @@ extern u64 __read_mostly shadow_user_mask;
 extern u64 __read_mostly shadow_accessed_mask;
 extern u64 __read_mostly shadow_dirty_mask;
 extern u64 __read_mostly shadow_mmio_value;
+extern u64 __read_mostly shadow_mmio_mask;
 extern u64 __read_mostly shadow_mmio_access_mask;
 extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_mask;
@@ -167,7 +164,8 @@ extern u8 __read_mostly shadow_phys_bits;
 
 static inline bool is_mmio_spte(u64 spte)
 {
-	return (spte & SPTE_SPECIAL_MASK) == SPTE_MMIO_MASK;
+	return (spte & shadow_mmio_mask) == shadow_mmio_value &&
+	       likely(shadow_mmio_value);
 }
 
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c4f2f2f6b945..54610270f66a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -885,7 +885,7 @@ static __init void svm_adjust_mmio_mask(void)
 	 */
 	mask = (mask_bit < 52) ? rsvd_bits(mask_bit, 51) | PT_PRESENT_MASK : 0;
 
-	kvm_mmu_set_mmio_spte_mask(mask, PT_WRITABLE_MASK | PT_USER_MASK);
+	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
 }
 
 static void svm_hardware_teardown(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 908f7a8af064..8a8423a97f13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4320,7 +4320,8 @@ static void ept_set_mmio_spte_mask(void)
 	 * EPT Misconfigurations can be generated if the value of bits 2:0
 	 * of an EPT paging-structure entry is 110b (write/execute).
 	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE, 0);
+	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
+				   VMX_EPT_RWX_MASK, 0);
 }
 
 #define VMX_XSS_EXIT_BITMAP 0
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 11/24] KVM: x86/mmu: Add module param to disable MMIO caching (for testing)
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 10/24] KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 12/24] KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs Sean Christopherson
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Add a module param to disable MMIO caching so that it's possible to test
the related flows without access to the necessary hardware.  Using shadow
paging with 64-bit KVM and 52 bits of physical address space must disable
MMIO caching as there are no reserved bits to be had.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index b2379094a8c1..503dec3f8c7a 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -17,6 +17,9 @@
 
 #include <asm/e820/api.h>
 
+static bool __read_mostly enable_mmio_caching = true;
+module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
+
 u64 __read_mostly shadow_nx_mask;
 u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
 u64 __read_mostly shadow_user_mask;
@@ -251,6 +254,9 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	BUG_ON((u64)(unsigned)access_mask != access_mask);
 	WARN_ON(mmio_value & shadow_nonpresent_or_rsvd_lower_gfn_mask);
 
+	if (!enable_mmio_caching)
+		mmio_value = 0;
+
 	/*
 	 * Disable MMIO caching if the MMIO value collides with the bits that
 	 * are used to hold the relocated GFN when the L1TF mitigation is
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 12/24] KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 11/24] KVM: x86/mmu: Add module param to disable MMIO caching (for testing) Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 13/24] KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation Sean Christopherson
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Rename the various A/D status defines to explicitly associated them with
TDP.  There is a subtle dependency on the bits in question never being
set when using PAE paging, as those bits are reserved, not available.
I.e. using these bits outside of TDP (technically EPT) would cause
explosions.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/locking.rst | 37 +++++++++++++++---------------
 arch/x86/kvm/mmu/spte.c            | 17 ++++++++++----
 arch/x86/kvm/mmu/spte.h            | 34 ++++++++++++++++++++-------
 3 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 0aa4817b466d..85876afe0441 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -38,12 +38,11 @@ the mmu-lock on x86. Currently, the page fault can be fast in one of the
 following two cases:
 
 1. Access Tracking: The SPTE is not present, but it is marked for access
-   tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to
-   restore the saved R/X bits. This is described in more detail later below.
+   tracking. That means we need to restore the saved R/X bits. This is
+   described in more detail later below.
 
-2. Write-Protection: The SPTE is present and the fault is
-   caused by write-protect. That means we just need to change the W bit of
-   the spte.
+2. Write-Protection: The SPTE is present and the fault is caused by
+   write-protect. That means we just need to change the W bit of the spte.
 
 What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
 SPTE_MMU_WRITEABLE bit on the spte:
@@ -54,9 +53,9 @@ SPTE_MMU_WRITEABLE bit on the spte:
   page write-protection.
 
 On fast page fault path, we will use cmpxchg to atomically set the spte W
-bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or
-restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This
-is safe because whenever changing these bits can be detected by cmpxchg.
+bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, to
+restore the saved R/X bits if for an access-traced spte, or both. This is
+safe because whenever changing these bits can be detected by cmpxchg.
 
 But we need carefully check these cases:
 
@@ -185,17 +184,17 @@ See the comments in spte_has_volatile_bits() and mmu_spte_update().
 Lockless Access Tracking:
 
 This is used for Intel CPUs that are using EPT but do not support the EPT A/D
-bits. In this case, when the KVM MMU notifier is called to track accesses to a
-page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present
-by clearing the RWX bits in the PTE and storing the original R & X bits in
-some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the
-PTE (using the ignored bit 62). When the VM tries to access the page later on,
-a fault is generated and the fast page fault mechanism described above is used
-to atomically restore the PTE to a Present state. The W bit is not saved when
-the PTE is marked for access tracking and during restoration to the Present
-state, the W bit is set depending on whether or not it was a write access. If
-it wasn't, then the W bit will remain clear until a write access happens, at
-which time it will be set using the Dirty tracking mechanism described above.
+bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and
+when the KVM MMU notifier is called to track accesses to a page (via
+kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware
+by clearing the RWX bits in the PTE and storing the original R & X bits in more
+unused/ignored bits. When the VM tries to access the page later on, a fault is
+generated and the fast page fault mechanism described above is used to
+atomically restore the PTE to a Present state. The W bit is not saved when the
+PTE is marked for access tracking and during restoration to the Present state,
+the W bit is set depending on whether or not it was a write access. If it
+wasn't, then the W bit will remain clear until a write access happens, at which
+time it will be set using the Dirty tracking mechanism described above.
 
 3. Reference
 ------------
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 503dec3f8c7a..3eaf143b7d12 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -42,7 +42,7 @@ static u64 generation_mmio_spte_mask(u64 gen)
 	u64 mask;
 
 	WARN_ON(gen & ~MMIO_SPTE_GEN_MASK);
-	BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_SPECIAL_MASK);
+	BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_TDP_AD_MASK);
 
 	mask = (gen << MMIO_SPTE_GEN_LOW_SHIFT) & MMIO_SPTE_GEN_LOW_MASK;
 	mask |= (gen << MMIO_SPTE_GEN_HIGH_SHIFT) & MMIO_SPTE_GEN_HIGH_MASK;
@@ -96,9 +96,16 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 	int ret = 0;
 
 	if (ad_disabled)
-		spte |= SPTE_AD_DISABLED_MASK;
+		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else if (kvm_vcpu_ad_need_write_protect(vcpu))
-		spte |= SPTE_AD_WRPROT_ONLY_MASK;
+		spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
+
+	/*
+	 * Bits 62:52 of PAE SPTEs are reserved.  WARN if said bits are set
+	 * if PAE paging may be employed (shadow paging or any 32-bit KVM).
+	 */
+	WARN_ON_ONCE((!tdp_enabled || !IS_ENABLED(CONFIG_X86_64)) &&
+		     (spte & SPTE_TDP_AD_MASK));
 
 	/*
 	 * For the EPT case, shadow_present_mask is 0 if hardware
@@ -180,7 +187,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 	       shadow_user_mask | shadow_x_mask | shadow_me_mask;
 
 	if (ad_disabled)
-		spte |= SPTE_AD_DISABLED_MASK;
+		spte |= SPTE_TDP_AD_DISABLED_MASK;
 	else
 		spte |= shadow_accessed_mask;
 
@@ -288,7 +295,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 {
 	BUG_ON(!dirty_mask != !accessed_mask);
 	BUG_ON(!accessed_mask && !acc_track_mask);
-	BUG_ON(acc_track_mask & SPTE_SPECIAL_MASK);
+	BUG_ON(acc_track_mask & SPTE_TDP_AD_MASK);
 
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 642a17b9964c..fd0a7911f098 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -8,11 +8,24 @@
 #define PT_FIRST_AVAIL_BITS_SHIFT 10
 #define PT64_SECOND_AVAIL_BITS_SHIFT 54
 
-/* The mask used to denote Access Tracking SPTEs.  Note, val=3 is available. */
-#define SPTE_SPECIAL_MASK (3ULL << 52)
-#define SPTE_AD_ENABLED_MASK (0ULL << 52)
-#define SPTE_AD_DISABLED_MASK (1ULL << 52)
-#define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52)
+/*
+ * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
+ * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
+ * PML, is enabled).  Use bits 52 and 53 to hold the type of A/D tracking that
+ * is must be employed for a given TDP SPTE.
+ *
+ * Note, the "enabled" mask must be '0', as bits 62:52 are _reserved_ for PAE
+ * paging, including NPT PAE.  This scheme works because legacy shadow paging
+ * is guaranteed to have A/D bits and write-protection is forced only for
+ * TDP with CPU dirty logging (PML).  If NPT ever gains PML-like support, it
+ * must be restricted to 64-bit KVM.
+ */
+#define SPTE_TDP_AD_SHIFT		52
+#define SPTE_TDP_AD_MASK		(3ULL << SPTE_TDP_AD_SHIFT)
+#define SPTE_TDP_AD_ENABLED_MASK	(0ULL << SPTE_TDP_AD_SHIFT)
+#define SPTE_TDP_AD_DISABLED_MASK	(1ULL << SPTE_TDP_AD_SHIFT)
+#define SPTE_TDP_AD_WRPROT_ONLY_MASK	(2ULL << SPTE_TDP_AD_SHIFT)
+static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 #define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
@@ -100,7 +113,7 @@ extern u64 __read_mostly shadow_present_mask;
 extern u64 __read_mostly shadow_me_mask;
 
 /*
- * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK;
+ * SPTEs in MMUs without A/D bits are marked with SPTE_TDP_AD_DISABLED_MASK;
  * shadow_acc_track_mask is the set of bits to be cleared in non-accessed
  * pages.
  */
@@ -176,13 +189,18 @@ static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 static inline bool spte_ad_enabled(u64 spte)
 {
 	MMU_WARN_ON(is_mmio_spte(spte));
-	return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK;
+	return (spte & SPTE_TDP_AD_MASK) != SPTE_TDP_AD_DISABLED_MASK;
 }
 
 static inline bool spte_ad_need_write_protect(u64 spte)
 {
 	MMU_WARN_ON(is_mmio_spte(spte));
-	return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
+	/*
+	 * This is benign for non-TDP SPTEs as SPTE_TDP_AD_ENABLED_MASK is '0',
+	 * and non-TDP SPTEs will never set these bits.  Optimize for 64-bit
+	 * TDP and do the A/D type check unconditionally.
+	 */
+	return (spte & SPTE_TDP_AD_MASK) != SPTE_TDP_AD_ENABLED_MASK;
 }
 
 static inline u64 spte_shadow_accessed_mask(u64 spte)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 13/24] KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 12/24] KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 14/24] KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits Sean Christopherson
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Use bits 53 and 52 for the MMIO generation now that they're not used to
identify MMIO SPTEs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 1 -
 arch/x86/kvm/mmu/spte.h | 8 ++++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 3eaf143b7d12..cf0e20b34cd3 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -42,7 +42,6 @@ static u64 generation_mmio_spte_mask(u64 gen)
 	u64 mask;
 
 	WARN_ON(gen & ~MMIO_SPTE_GEN_MASK);
-	BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_TDP_AD_MASK);
 
 	mask = (gen << MMIO_SPTE_GEN_LOW_SHIFT) & MMIO_SPTE_GEN_LOW_MASK;
 	mask |= (gen << MMIO_SPTE_GEN_HIGH_SHIFT) & MMIO_SPTE_GEN_HIGH_MASK;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index fd0a7911f098..bf4f49890606 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -65,11 +65,11 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 #define SPTE_MMU_WRITEABLE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
 
 /*
- * Due to limited space in PTEs, the MMIO generation is a 18 bit subset of
+ * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
  * the memslots generation and is derived as follows:
  *
  * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
- * Bits 9-17 of the MMIO generation are propagated to spte bits 54-62
+ * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
  *
  * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
  * the MMIO generation number, as doing so would require stealing a bit from
@@ -82,7 +82,7 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 #define MMIO_SPTE_GEN_LOW_START		3
 #define MMIO_SPTE_GEN_LOW_END		11
 
-#define MMIO_SPTE_GEN_HIGH_START	PT64_SECOND_AVAIL_BITS_SHIFT
+#define MMIO_SPTE_GEN_HIGH_START	52
 #define MMIO_SPTE_GEN_HIGH_END		62
 
 #define MMIO_SPTE_GEN_LOW_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_END, \
@@ -94,7 +94,7 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 #define MMIO_SPTE_GEN_HIGH_BITS		(MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)
 
 /* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 9);
+static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
 
 #define MMIO_SPTE_GEN_LOW_SHIFT		(MMIO_SPTE_GEN_LOW_START - 0)
 #define MMIO_SPTE_GEN_HIGH_SHIFT	(MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 14/24] KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 13/24] KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 15/24] KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper Sean Christopherson
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Document that SHADOW_ACC_TRACK_SAVED_BITS_SHIFT is directly dependent on
bits 53:52 being used to track the A/D type.

Remove PT64_SECOND_AVAIL_BITS_SHIFT as it is at best misleading, and at
worst wrong.  For PAE paging, which arguably is a variant of PT64, the
bits are reserved.  For MMIO SPTEs the bits are not available as they're
used for the MMIO generation.  For access tracked SPTEs, they are also
not available as bits 56:54 are used to store the original RX bits.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index bf4f49890606..e918b8f0b21d 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -6,7 +6,6 @@
 #include "mmu_internal.h"
 
 #define PT_FIRST_AVAIL_BITS_SHIFT 10
-#define PT64_SECOND_AVAIL_BITS_SHIFT 54
 
 /*
  * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
@@ -134,11 +133,14 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  * The mask/shift to use for saving the original R/X bits when marking the PTE
  * as not-present for access tracking purposes. We do not save the W bit as the
  * PTEs being access tracked also need to be dirty tracked, so the W bit will be
- * restored only when a write is attempted to the page.
+ * restored only when a write is attempted to the page.  This mask obviously
+ * must not overlap the A/D type mask.
  */
 #define SHADOW_ACC_TRACK_SAVED_BITS_MASK (PT64_EPT_READABLE_MASK | \
 					  PT64_EPT_EXECUTABLE_MASK)
-#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT PT64_SECOND_AVAIL_BITS_SHIFT
+#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54
+static_assert(!(SPTE_TDP_AD_MASK & (SHADOW_ACC_TRACK_SAVED_BITS_MASK <<
+				    SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)));
 
 /*
  * If a thread running without exclusive control of the MMU lock must perform a
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 15/24] KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 14/24] KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 16/24] KVM: x86/mmu: Co-locate code for setting various SPTE masks Sean Christopherson
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Move kvm_mmu_set_mask_ptes() into mmu.c as prep for future cleanup of the
mask initialization code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 4 ++++
 arch/x86/kvm/x86.c     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f92571b786a2..99d9c85a1820 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5796,6 +5796,10 @@ int kvm_mmu_module_init(void)
 
 	kvm_set_mmio_spte_mask();
 
+	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
+			PT_DIRTY_MASK, PT64_NX_MASK, 0,
+			PT_PRESENT_MASK, 0, sme_me_mask);
+
 	pte_list_desc_cache = kmem_cache_create("pte_list_desc",
 					    sizeof(struct pte_list_desc),
 					    0, SLAB_ACCOUNT, NULL);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1b7bdf47e7e..5a27468c6afa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8024,9 +8024,6 @@ int kvm_arch_init(void *opaque)
 	if (r)
 		goto out_free_percpu;
 
-	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
-			PT_DIRTY_MASK, PT64_NX_MASK, 0,
-			PT_PRESENT_MASK, 0, sme_me_mask);
 	kvm_timer_init();
 
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 16/24] KVM: x86/mmu: Co-locate code for setting various SPTE masks
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (14 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 15/24] KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 17/24] KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper Sean Christopherson
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Squish all the code for (re)setting the various SPTE masks into one
location.  With the split code, it's not at all clear that the masks are
set once during module initialization.  This will allow a future patch to
clean up initialization of the masks without shuffling code all over
tarnation.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c  | 25 -------------------------
 arch/x86/kvm/mmu/spte.c | 19 +++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c  | 17 ++++++-----------
 3 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 99d9c85a1820..1fb500db46e0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5710,25 +5710,6 @@ static void mmu_destroy_caches(void)
 	kmem_cache_destroy(mmu_page_header_cache);
 }
 
-static void kvm_set_mmio_spte_mask(void)
-{
-	u64 mask;
-
-	/*
-	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
-	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
-	 * paging) support a maximum of 52 bits of PA, i.e. if the CPU supports
-	 * 52-bit physical addresses then there are no reserved PA bits in the
-	 * PTEs and so the reserved PA approach must be disabled.
-	 */
-	if (shadow_phys_bits < 52)
-		mask = BIT_ULL(51) | PT_PRESENT_MASK;
-	else
-		mask = 0;
-
-	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
-}
-
 static bool get_nx_auto_mode(void)
 {
 	/* Return true when CPU has the bug, and mitigations are ON */
@@ -5794,12 +5775,6 @@ int kvm_mmu_module_init(void)
 
 	kvm_mmu_reset_all_pte_masks();
 
-	kvm_set_mmio_spte_mask();
-
-	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
-			PT_DIRTY_MASK, PT64_NX_MASK, 0,
-			PT_PRESENT_MASK, 0, sme_me_mask);
-
 	pte_list_desc_cache = kmem_cache_create("pte_list_desc",
 					    sizeof(struct pte_list_desc),
 					    0, SLAB_ACCOUNT, NULL);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index cf0e20b34cd3..b15d6006dbee 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -310,6 +310,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 void kvm_mmu_reset_all_pte_masks(void)
 {
 	u8 low_phys_bits;
+	u64 mask;
 
 	shadow_user_mask = 0;
 	shadow_accessed_mask = 0;
@@ -344,4 +345,22 @@ void kvm_mmu_reset_all_pte_masks(void)
 
 	shadow_nonpresent_or_rsvd_lower_gfn_mask =
 		GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT);
+
+	/*
+	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
+	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
+	 * paging) support a maximum of 52 bits of PA, i.e. if the CPU supports
+	 * 52-bit physical addresses then there are no reserved PA bits in the
+	 * PTEs and so the reserved PA approach must be disabled.
+	 */
+	if (shadow_phys_bits < 52)
+		mask = BIT_ULL(51) | PT_PRESENT_MASK;
+	else
+		mask = 0;
+
+	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
+
+	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
+			PT_DIRTY_MASK, PT64_NX_MASK, 0,
+			PT_PRESENT_MASK, 0, sme_me_mask);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8a8423a97f13..730076b3832f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4314,16 +4314,6 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	vmx->secondary_exec_control = exec_control;
 }
 
-static void ept_set_mmio_spte_mask(void)
-{
-	/*
-	 * EPT Misconfigurations can be generated if the value of bits 2:0
-	 * of an EPT paging-structure entry is 110b (write/execute).
-	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
-				   VMX_EPT_RWX_MASK, 0);
-}
-
 #define VMX_XSS_EXIT_BITMAP 0
 
 /*
@@ -5462,7 +5452,12 @@ static void vmx_enable_tdp(void)
 		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
 		VMX_EPT_RWX_MASK, 0ull);
 
-	ept_set_mmio_spte_mask();
+	/*
+	 * EPT Misconfigurations can be generated if the value of bits 2:0
+	 * of an EPT paging-structure entry is 110b (write/execute).
+	 */
+	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
+				   VMX_EPT_RWX_MASK, 0);
 }
 
 /*
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 17/24] KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (15 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 16/24] KVM: x86/mmu: Co-locate code for setting various SPTE masks Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 18/24] KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic Sean Christopherson
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Let the MMU deal with the SPTE masks to avoid splitting the logic and
knowledge across the MMU and VMX.

The SPTE masks that are used for EPT are very, very tightly coupled to
the MMU implementation.  The use of available bits, the existence of A/D
types, the fact that shadow_x_mask even exists, and so on and so forth
are all baked into the MMU implementation.  Cross referencing the params
to the masks is also a nightmare, as pretty much every param is a u64.

A future patch will make the location of the MMU_WRITABLE and
HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit.
Doing that change with the current kvm_mmu_set_mask_ptes() would be an
absolute mess.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 --
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/mmu/spte.c         | 60 ++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.c          | 20 ++---------
 4 files changed, 29 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cc376327a168..629f74f2a00a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1407,9 +1407,6 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_mmu_create(struct kvm_vcpu *vcpu);
 void kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
-void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-		u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 00f4a541e04d..11cf7793cfee 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -60,6 +60,7 @@ static __always_inline u64 rsvd_bits(int s, int e)
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
 
 void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index b15d6006dbee..ac5ea6fda969 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -16,6 +16,7 @@
 #include "spte.h"
 
 #include <asm/e820/api.h>
+#include <asm/vmx.h>
 
 static bool __read_mostly enable_mmio_caching = true;
 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
@@ -281,45 +282,31 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
-/*
- * Sets the shadow PTE masks used by the MMU.
- *
- * Assumptions:
- *  - Setting either @accessed_mask or @dirty_mask requires setting both
- *  - At least one of @accessed_mask or @acc_track_mask must be set
- */
-void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-		u64 acc_track_mask, u64 me_mask)
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
 {
-	BUG_ON(!dirty_mask != !accessed_mask);
-	BUG_ON(!accessed_mask && !acc_track_mask);
-	BUG_ON(acc_track_mask & SPTE_TDP_AD_MASK);
+	shadow_user_mask	= VMX_EPT_READABLE_MASK;
+	shadow_accessed_mask	= has_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull;
+	shadow_dirty_mask	= has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
+	shadow_nx_mask		= 0ull;
+	shadow_x_mask		= VMX_EPT_EXECUTABLE_MASK;
+	shadow_present_mask	= has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
+	shadow_acc_track_mask	= VMX_EPT_RWX_MASK;
+	shadow_me_mask		= 0ull;
 
-	shadow_user_mask = user_mask;
-	shadow_accessed_mask = accessed_mask;
-	shadow_dirty_mask = dirty_mask;
-	shadow_nx_mask = nx_mask;
-	shadow_x_mask = x_mask;
-	shadow_present_mask = p_mask;
-	shadow_acc_track_mask = acc_track_mask;
-	shadow_me_mask = me_mask;
+	/*
+	 * EPT Misconfigurations are generated if the value of bits 2:0
+	 * of an EPT paging-structure entry is 110b (write/execute).
+	 */
+	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
+				   VMX_EPT_RWX_MASK, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
+EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks);
 
 void kvm_mmu_reset_all_pte_masks(void)
 {
 	u8 low_phys_bits;
 	u64 mask;
 
-	shadow_user_mask = 0;
-	shadow_accessed_mask = 0;
-	shadow_dirty_mask = 0;
-	shadow_nx_mask = 0;
-	shadow_x_mask = 0;
-	shadow_present_mask = 0;
-	shadow_acc_track_mask = 0;
-
 	shadow_phys_bits = kvm_get_shadow_phys_bits();
 
 	/*
@@ -346,6 +333,15 @@ void kvm_mmu_reset_all_pte_masks(void)
 	shadow_nonpresent_or_rsvd_lower_gfn_mask =
 		GENMASK_ULL(low_phys_bits - 1, PAGE_SHIFT);
 
+	shadow_user_mask	= PT_USER_MASK;
+	shadow_accessed_mask	= PT_ACCESSED_MASK;
+	shadow_dirty_mask	= PT_DIRTY_MASK;
+	shadow_nx_mask		= PT64_NX_MASK;
+	shadow_x_mask		= 0;
+	shadow_present_mask	= PT_PRESENT_MASK;
+	shadow_acc_track_mask	= 0;
+	shadow_me_mask		= sme_me_mask;
+
 	/*
 	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
 	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
@@ -359,8 +355,4 @@ void kvm_mmu_reset_all_pte_masks(void)
 		mask = 0;
 
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
-
-	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
-			PT_DIRTY_MASK, PT64_NX_MASK, 0,
-			PT_PRESENT_MASK, 0, sme_me_mask);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 730076b3832f..6d7e760fdfa0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5443,23 +5443,6 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void vmx_enable_tdp(void)
-{
-	kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK,
-		enable_ept_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull,
-		enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
-		0ull, VMX_EPT_EXECUTABLE_MASK,
-		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
-		VMX_EPT_RWX_MASK, 0ull);
-
-	/*
-	 * EPT Misconfigurations can be generated if the value of bits 2:0
-	 * of an EPT paging-structure entry is 110b (write/execute).
-	 */
-	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
-				   VMX_EPT_RWX_MASK, 0);
-}
-
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
@@ -7788,7 +7771,8 @@ static __init int hardware_setup(void)
 	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
 
 	if (enable_ept)
-		vmx_enable_tdp();
+		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
+				      cpu_has_vmx_ept_execute_only());
 
 	if (!enable_ept)
 		ept_lpage_level = 0;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 18/24] KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (16 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 17/24] KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 19/24] KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs Sean Christopherson
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/locking.rst | 18 +++++++++---------
 arch/x86/kvm/mmu.h                 | 12 ++++++------
 arch/x86/kvm/mmu/mmu.c             |  8 ++++----
 arch/x86/kvm/mmu/paging_tmpl.h     |  2 +-
 arch/x86/kvm/mmu/spte.c            | 13 +++++++++----
 arch/x86/kvm/mmu/spte.h            | 13 ++++++-------
 arch/x86/kvm/mmu/tdp_mmu.c         |  6 +++---
 7 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 85876afe0441..1fc860c007a3 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -44,18 +44,18 @@ following two cases:
 2. Write-Protection: The SPTE is present and the fault is caused by
    write-protect. That means we just need to change the W bit of the spte.
 
-What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
-SPTE_MMU_WRITEABLE bit on the spte:
+What we use to avoid all the race is the Host-writable bit and MMU-writable bit
+on the spte:
 
-- SPTE_HOST_WRITEABLE means the gfn is writable on host.
-- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when
-  the gfn is writable on guest mmu and it is not write-protected by shadow
-  page write-protection.
+- Host-writable means the gfn is writable in the host kernel page tables and in
+  its KVM memslot.
+- MMU-writable means the gfn is writable in the guest's mmu and it is not
+  write-protected by shadow page write-protection.
 
 On fast page fault path, we will use cmpxchg to atomically set the spte W
-bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, to
-restore the saved R/X bits if for an access-traced spte, or both. This is
-safe because whenever changing these bits can be detected by cmpxchg.
+bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved
+R/X bits if for an access-traced spte, or both. This is safe because whenever
+changing these bits can be detected by cmpxchg.
 
 But we need carefully check these cases:
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 11cf7793cfee..72b0f66073dc 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -125,7 +125,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
  * write-protects guest page to sync the guest modification, b) another one is
  * used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences
  * between these two sorts are:
- * 1) the first case clears SPTE_MMU_WRITEABLE bit.
+ * 1) the first case clears MMU-writable bit.
  * 2) the first case requires flushing tlb immediately avoiding corrupting
  *    shadow page table between all vcpus so it should be in the protection of
  *    mmu-lock. And the another case does not need to flush tlb until returning
@@ -136,17 +136,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
  * So, there is the problem: the first case can meet the corrupted tlb caused
  * by another case which write-protects pages but without flush tlb
  * immediately. In order to making the first case be aware this problem we let
- * it flush tlb if we try to write-protect a spte whose SPTE_MMU_WRITEABLE bit
- * is set, it works since another case never touches SPTE_MMU_WRITEABLE bit.
+ * it flush tlb if we try to write-protect a spte whose MMU-writable bit
+ * is set, it works since another case never touches MMU-writable bit.
  *
  * Anyway, whenever a spte is updated (only permission and status bits are
- * changed) we need to check whether the spte with SPTE_MMU_WRITEABLE becomes
+ * changed) we need to check whether the spte with MMU-writable becomes
  * readonly, if that happens, we need to flush tlb. Fortunately,
  * mmu_spte_update() has already handled it perfectly.
  *
- * The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
+ * The rules to use MMU-writable and PT_WRITABLE_MASK:
  * - if we want to see if it has writable tlb entry or if the spte can be
- *   writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is the most
+ *   writable on the mmu mapping, check MMU-writable, this is the most
  *   case, otherwise
  * - if we fix page fault on the spte or do write-protection by dirty logging,
  *   check PT_WRITABLE_MASK.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1fb500db46e0..e636fcd529d2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1107,7 +1107,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect)
 	rmap_printk("spte %p %llx\n", sptep, *sptep);
 
 	if (pt_protect)
-		spte &= ~SPTE_MMU_WRITEABLE;
+		spte &= ~shadow_mmu_writable_mask;
 	spte = spte & ~PT_WRITABLE_MASK;
 
 	return mmu_spte_update(sptep, spte);
@@ -5485,9 +5485,9 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	 * spte from present to present (changing the spte from present
 	 * to nonpresent will flush all the TLBs immediately), in other
 	 * words, the only case we care is mmu_spte_update() where we
-	 * have checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE
-	 * instead of PT_WRITABLE_MASK, that means it does not depend
-	 * on PT_WRITABLE_MASK anymore.
+	 * have checked Host-writable | MMU-writable instead of
+	 * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
+	 * anymore.
 	 */
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 55d7b473ac44..8b9987d5fe02 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1084,7 +1084,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		nr_present++;
 
-		host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
+		host_writable = sp->spt[i] & shadow_host_writable_mask;
 
 		set_spte_ret |= set_spte(vcpu, &sp->spt[i],
 					 pte_access, PG_LEVEL_4K,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index ac5ea6fda969..2329ba60c67a 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -21,6 +21,8 @@
 static bool __read_mostly enable_mmio_caching = true;
 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
 
+u64 __read_mostly shadow_host_writable_mask;
+u64 __read_mostly shadow_mmu_writable_mask;
 u64 __read_mostly shadow_nx_mask;
 u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
 u64 __read_mostly shadow_user_mask;
@@ -137,7 +139,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 			kvm_is_mmio_pfn(pfn));
 
 	if (host_writable)
-		spte |= SPTE_HOST_WRITEABLE;
+		spte |= shadow_host_writable_mask;
 	else
 		pte_access &= ~ACC_WRITE_MASK;
 
@@ -147,7 +149,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 	spte |= (u64)pfn << PAGE_SHIFT;
 
 	if (pte_access & ACC_WRITE_MASK) {
-		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+		spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask;
 
 		/*
 		 * Optimization: for pte sync, if spte was writable the hash
@@ -163,7 +165,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 				 __func__, gfn);
 			ret |= SET_SPTE_WRITE_PROTECTED_PT;
 			pte_access &= ~ACC_WRITE_MASK;
-			spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
+			spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
 		}
 	}
 
@@ -202,7 +204,7 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
 	new_spte |= (u64)new_pfn << PAGE_SHIFT;
 
 	new_spte &= ~PT_WRITABLE_MASK;
-	new_spte &= ~SPTE_HOST_WRITEABLE;
+	new_spte &= ~shadow_host_writable_mask;
 
 	new_spte = mark_spte_for_access_track(new_spte);
 
@@ -342,6 +344,9 @@ void kvm_mmu_reset_all_pte_masks(void)
 	shadow_acc_track_mask	= 0;
 	shadow_me_mask		= sme_me_mask;
 
+	shadow_host_writable_mask = DEFAULT_SPTE_HOST_WRITEABLE;
+	shadow_mmu_writable_mask  = DEFAULT_SPTE_MMU_WRITEABLE;
+
 	/*
 	 * Set a reserved PA bit in MMIO SPTEs to generate page faults with
 	 * PFEC.RSVD=1 on MMIO accesses.  64-bit PTEs (PAE, x86-64, and EPT
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index e918b8f0b21d..287540d211a9 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -5,8 +5,6 @@
 
 #include "mmu_internal.h"
 
-#define PT_FIRST_AVAIL_BITS_SHIFT 10
-
 /*
  * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
  * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
@@ -59,9 +57,8 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 	(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
-
-#define SPTE_HOST_WRITEABLE	(1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
-#define SPTE_MMU_WRITEABLE	(1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
+#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(10)
+#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(11)
 
 /*
  * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
@@ -100,6 +97,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
 
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0)
 
+extern u64 __read_mostly shadow_host_writable_mask;
+extern u64 __read_mostly shadow_mmu_writable_mask;
 extern u64 __read_mostly shadow_nx_mask;
 extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
 extern u64 __read_mostly shadow_user_mask;
@@ -264,8 +263,8 @@ static inline bool is_dirty_spte(u64 spte)
 
 static inline bool spte_can_locklessly_be_made_writable(u64 spte)
 {
-	return (spte & (SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE)) ==
-		(SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE);
+	return (spte & shadow_host_writable_mask) &&
+	       (spte & shadow_mmu_writable_mask);
 }
 
 static inline u64 get_mmio_spte_generation(u64 spte)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 782cae1eb5e1..bef0e1908e82 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1329,7 +1329,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 
 /*
  * Removes write access on the last level SPTE mapping this GFN and unsets the
- * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted.
+ * MMU-writable bit to ensure future writes continue to be intercepted.
  * Returns true if an SPTE was set and a TLB flush is needed.
  */
 static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
@@ -1346,7 +1346,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 			break;
 
 		new_spte = iter.old_spte &
-			~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE);
+			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
 
 		tdp_mmu_set_spte(kvm, &iter, new_spte);
 		spte_set = true;
@@ -1359,7 +1359,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 
 /*
  * Removes write access on the last level SPTE mapping this GFN and unsets the
- * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted.
+ * MMU-writable bit to ensure future writes continue to be intercepted.
  * Returns true if an SPTE was set and a TLB flush is needed.
  */
 bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 19/24] KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (17 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 18/24] KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs Sean Christopherson
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Use bits 57 and 58 for HOST_WRITABLE and MMU_WRITABLE when using EPT.
This will allow using bit 11 as a constant MMU_PRESENT, which is
desirable as checking for a shadow-present SPTE is one of the most
common SPTE operations in KVM, particular in hot paths such as page
faults.

EPT is short on low available bits; currently only bit 11 is the only
always-available bit.  Bit 10 is also available, but only while KVM
doesn't support mode-based execution.  On the other hand, PAE paging
doesn't have _any_ high available bits.  Thus, using bit 11 is the only
feasible option for MMU_PRESENT.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c |  3 +++
 arch/x86/kvm/mmu/spte.h | 48 ++++++++++++++++++++++++++++-------------
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2329ba60c67a..d12acf5eb871 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -295,6 +295,9 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
 	shadow_acc_track_mask	= VMX_EPT_RWX_MASK;
 	shadow_me_mask		= 0ull;
 
+	shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
+	shadow_mmu_writable_mask  = EPT_SPTE_MMU_WRITABLE;
+
 	/*
 	 * EPT Misconfigurations are generated if the value of bits 2:0
 	 * of an EPT paging-structure entry is 110b (write/execute).
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 287540d211a9..8996baa8da15 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -57,8 +57,39 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 	(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
-#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(10)
-#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(11)
+/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
+#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
+#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)
+
+/*
+ * The mask/shift to use for saving the original R/X bits when marking the PTE
+ * as not-present for access tracking purposes. We do not save the W bit as the
+ * PTEs being access tracked also need to be dirty tracked, so the W bit will be
+ * restored only when a write is attempted to the page.  This mask obviously
+ * must not overlap the A/D type mask.
+ */
+#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (PT64_EPT_READABLE_MASK | \
+					  PT64_EPT_EXECUTABLE_MASK)
+#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54
+#define SHADOW_ACC_TRACK_SAVED_MASK	(SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
+					 SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
+static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
+
+/*
+ * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
+ * to not overlap the A/D type mask or the saved access bits of access-tracked
+ * SPTEs when A/D bits are disabled.
+ */
+#define EPT_SPTE_HOST_WRITABLE		BIT_ULL(57)
+#define EPT_SPTE_MMU_WRITABLE		BIT_ULL(58)
+
+static_assert(!(EPT_SPTE_HOST_WRITABLE & SPTE_TDP_AD_MASK));
+static_assert(!(EPT_SPTE_MMU_WRITABLE & SPTE_TDP_AD_MASK));
+static_assert(!(EPT_SPTE_HOST_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
+static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
+
+/* Defined only to keep the above static asserts readable. */
+#undef SHADOW_ACC_TRACK_SAVED_MASK
 
 /*
  * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
@@ -128,19 +159,6 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  */
 #define SHADOW_NONPRESENT_OR_RSVD_MASK_LEN 5
 
-/*
- * The mask/shift to use for saving the original R/X bits when marking the PTE
- * as not-present for access tracking purposes. We do not save the W bit as the
- * PTEs being access tracked also need to be dirty tracked, so the W bit will be
- * restored only when a write is attempted to the page.  This mask obviously
- * must not overlap the A/D type mask.
- */
-#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (PT64_EPT_READABLE_MASK | \
-					  PT64_EPT_EXECUTABLE_MASK)
-#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54
-static_assert(!(SPTE_TDP_AD_MASK & (SHADOW_ACC_TRACK_SAVED_BITS_MASK <<
-				    SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)));
-
 /*
  * If a thread running without exclusive control of the MMU lock must perform a
  * multi-part operation on an SPTE, it can set the SPTE to REMOVED_SPTE as a
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (18 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 19/24] KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-03-08 18:52   ` Tom Lendacky
  2021-02-25 20:47 ` [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO) Sean Christopherson
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
the MMU's perspective.  Checking for shadow-present SPTEs is a very
common operation for the MMU, particularly in hot paths such as page
faults.  With the addition of "removed" SPTEs for the TDP MMU,
identifying shadow-present SPTEs is quite costly especially since it
requires checking multiple 64-bit values.

On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
On 32-bit KVM, this increases the footprint by ~200 bytes, but only
because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c |  8 ++++----
 arch/x86/kvm/mmu/spte.h | 11 ++++++++++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index d12acf5eb871..e07aabb23b8a 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -94,7 +94,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 		     bool can_unsync, bool host_writable, bool ad_disabled,
 		     u64 *new_spte)
 {
-	u64 spte = 0;
+	u64 spte = SPTE_MMU_PRESENT_MASK;
 	int ret = 0;
 
 	if (ad_disabled)
@@ -183,10 +183,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
 
 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
 {
-	u64 spte;
+	u64 spte = SPTE_MMU_PRESENT_MASK;
 
-	spte = __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_me_mask;
+	spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
+		shadow_user_mask | shadow_x_mask | shadow_me_mask;
 
 	if (ad_disabled)
 		spte |= SPTE_TDP_AD_DISABLED_MASK;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 8996baa8da15..645e9bc2d4a2 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -5,6 +5,15 @@
 
 #include "mmu_internal.h"
 
+/*
+ * A MMU present SPTE is backed by actual memory and may or may not be present
+ * in hardware.  E.g. MMIO SPTEs are not considered present.  Use bit 11, as it
+ * is ignored by all flavors of SPTEs and checking a low bit often generates
+ * better code than for a high bit, e.g. 56+.  MMU present checks are pervasive
+ * enough that the improved code generation is noticeable in KVM's footprint.
+ */
+#define SPTE_MMU_PRESENT_MASK		BIT_ULL(11)
+
 /*
  * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
  * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
@@ -241,7 +250,7 @@ static inline bool is_access_track_spte(u64 spte)
 
 static inline bool is_shadow_present_pte(u64 pte)
 {
-	return (pte != 0) && !is_mmio_spte(pte) && !is_removed_spte(pte);
+	return !!(pte & SPTE_MMU_PRESENT_MASK);
 }
 
 static inline bool is_large_pte(u64 pte)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (19 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-05-06 23:05   ` Matteo Croce
  2021-02-25 20:47 ` [PATCH 22/24] KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents Sean Christopherson
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Tweak the MMU_WARN that guards against weirdness when querying A/D status
to fire on a !MMU_PRESENT SPTE, as opposed to a MMIO SPTE.  Attempting to
query A/D status on any kind of !MMU_PRESENT SPTE, MMIO or otherwise,
indicates a KVM bug.  Case in point, several now-fixed bugs were
identified by enabling this new WARN.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 645e9bc2d4a2..2fad4ccd3679 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -209,6 +209,11 @@ static inline bool is_mmio_spte(u64 spte)
 	       likely(shadow_mmio_value);
 }
 
+static inline bool is_shadow_present_pte(u64 pte)
+{
+	return !!(pte & SPTE_MMU_PRESENT_MASK);
+}
+
 static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 {
 	return sp->role.ad_disabled;
@@ -216,13 +221,13 @@ static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
 
 static inline bool spte_ad_enabled(u64 spte)
 {
-	MMU_WARN_ON(is_mmio_spte(spte));
+	MMU_WARN_ON(!is_shadow_present_pte(spte));
 	return (spte & SPTE_TDP_AD_MASK) != SPTE_TDP_AD_DISABLED_MASK;
 }
 
 static inline bool spte_ad_need_write_protect(u64 spte)
 {
-	MMU_WARN_ON(is_mmio_spte(spte));
+	MMU_WARN_ON(!is_shadow_present_pte(spte));
 	/*
 	 * This is benign for non-TDP SPTEs as SPTE_TDP_AD_ENABLED_MASK is '0',
 	 * and non-TDP SPTEs will never set these bits.  Optimize for 64-bit
@@ -233,13 +238,13 @@ static inline bool spte_ad_need_write_protect(u64 spte)
 
 static inline u64 spte_shadow_accessed_mask(u64 spte)
 {
-	MMU_WARN_ON(is_mmio_spte(spte));
+	MMU_WARN_ON(!is_shadow_present_pte(spte));
 	return spte_ad_enabled(spte) ? shadow_accessed_mask : 0;
 }
 
 static inline u64 spte_shadow_dirty_mask(u64 spte)
 {
-	MMU_WARN_ON(is_mmio_spte(spte));
+	MMU_WARN_ON(!is_shadow_present_pte(spte));
 	return spte_ad_enabled(spte) ? shadow_dirty_mask : 0;
 }
 
@@ -248,11 +253,6 @@ static inline bool is_access_track_spte(u64 spte)
 	return !spte_ad_enabled(spte) && (spte & shadow_acc_track_mask) == 0;
 }
 
-static inline bool is_shadow_present_pte(u64 pte)
-{
-	return !!(pte & SPTE_MMU_PRESENT_MASK);
-}
-
 static inline bool is_large_pte(u64 pte)
 {
 	return pte & PT_PAGE_SIZE_MASK;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 22/24] KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (20 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO) Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 23/24] KVM: x86/mmu: Use low available bits for removed SPTEs Sean Christopherson
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Use the is_removed_spte() helper instead of open coding the check.

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bef0e1908e82..7f2c4760b84d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -490,7 +490,7 @@ static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	 * Do not change removed SPTEs. Only the thread that froze the SPTE
 	 * may modify it.
 	 */
-	if (iter->old_spte == REMOVED_SPTE)
+	if (is_removed_spte(iter->old_spte))
 		return false;
 
 	if (cmpxchg64(rcu_dereference(iter->sptep), iter->old_spte,
@@ -565,7 +565,7 @@ static inline void __tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
 	 * should be used. If operating under the MMU lock in write mode, the
 	 * use of the removed SPTE should not be necessary.
 	 */
-	WARN_ON(iter->old_spte == REMOVED_SPTE);
+	WARN_ON(is_removed_spte(iter->old_spte));
 
 	WRITE_ONCE(*rcu_dereference(iter->sptep), new_spte);
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 23/24] KVM: x86/mmu: Use low available bits for removed SPTEs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (21 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 22/24] KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-25 20:47 ` [PATCH 24/24] KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE Sean Christopherson
  2021-02-26  9:38 ` [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Paolo Bonzini
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Use low "available" bits to tag REMOVED SPTEs.  Using a high bit is
moderately costly as it often causes the compiler to generate a 64-bit
immediate.  More importantly, this makes it very clear REMOVED_SPTE is
a value, not a flag.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/spte.c | 11 ++++++++++-
 arch/x86/kvm/mmu/spte.h | 11 +++++++----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index e07aabb23b8a..66d43cec0c31 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -277,7 +277,16 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 				  SHADOW_NONPRESENT_OR_RSVD_MASK_LEN)))
 		mmio_value = 0;
 
-	WARN_ON((mmio_value & mmio_mask) != mmio_value);
+	/*
+	 * The masked MMIO value must obviously match itself and a removed SPTE
+	 * must not get a false positive.  Removed SPTEs and MMIO SPTEs should
+	 * never collide as MMIO must set some RWX bits, and removed SPTEs must
+	 * not set any RWX bits.
+	 */
+	if (WARN_ON((mmio_value & mmio_mask) != mmio_value) ||
+	    WARN_ON(mmio_value && (REMOVED_SPTE & mmio_mask) == mmio_value))
+		mmio_value = 0;
+
 	shadow_mmio_value = mmio_value;
 	shadow_mmio_mask  = mmio_mask;
 	shadow_mmio_access_mask = access_mask;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 2fad4ccd3679..b53036d9ddf3 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -174,13 +174,16 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_mask;
  * non-present intermediate value. Other threads which encounter this value
  * should not modify the SPTE.
  *
- * This constant works because it is considered non-present on both AMD and
- * Intel CPUs and does not create a L1TF vulnerability because the pfn section
- * is zeroed out.
+ * Use a semi-arbitrary value that doesn't set RWX bits, i.e. is not-present on
+ * bot AMD and Intel CPUs, and doesn't set PFN bits, i.e. doesn't create a L1TF
+ * vulnerability.  Use only low bits to avoid 64-bit immediates.
  *
  * Only used by the TDP MMU.
  */
-#define REMOVED_SPTE (1ull << 59)
+#define REMOVED_SPTE	0x5a0ULL
+
+/* Removed SPTEs must not be misconstrued as shadow present PTEs. */
+static_assert(!(REMOVED_SPTE & SPTE_MMU_PRESENT_MASK));
 
 static inline bool is_removed_spte(u64 spte)
 {
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 24/24] KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (22 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 23/24] KVM: x86/mmu: Use low available bits for removed SPTEs Sean Christopherson
@ 2021-02-25 20:47 ` Sean Christopherson
  2021-02-26  9:38 ` [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Paolo Bonzini
  24 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-02-25 20:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Debugging unexpected reserved bit page faults sucks.  Dump the reserved
bits that (likely) caused the page fault to make debugging suck a little
less.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e636fcd529d2..dab0e950a54e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3555,11 +3555,12 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 			    __is_rsvd_bits_set(rsvd_check, sptes[level], level);
 
 	if (reserved) {
-		pr_err("%s: detect reserved bits on spte, addr 0x%llx, dump hierarchy:\n",
+		pr_err("%s: reserved bits set on MMU-present spte, addr 0x%llx, hierarchy:\n",
 		       __func__, addr);
 		for (level = root; level >= leaf; level--)
-			pr_err("------ spte 0x%llx level %d.\n",
-			       sptes[level], level);
+			pr_err("------ spte = 0x%llx level = %d, rsvd bits = 0x%llx",
+			       sptes[level], level,
+			       rsvd_check->rsvd_bits_mask[(sptes[level] >> 7) & 1][level-1]);
 	}
 
 	return reserved;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs
  2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
                   ` (23 preceding siblings ...)
  2021-02-25 20:47 ` [PATCH 24/24] KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE Sean Christopherson
@ 2021-02-26  9:38 ` Paolo Bonzini
  24 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-02-26  9:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 25/02/21 21:47, Sean Christopherson wrote:
> This series adds the simple idea of tagging shadow-present SPTEs with
> a single bit, instead of looking for non-zero SPTEs that aren't MMIO and
> aren't REMOVED.  Doing so reduces KVM's code footprint by 2k bytes on
> x86-64, and presumably adds a tiny performance boost in related paths.
> 
> But, actually adding MMU_PRESENT without breaking one flow or another is
> a bit of a debacle.  The main issue is that EPT doesn't have many low
> available bits, and PAE doesn't have any high available bits.  And, the
> existing MMU_WRITABLE and HOST_WRITABLE flags aren't optional, i.e. are
> needed for all flavors of paging.  The solution I settled on is to let
> make the *_WRITABLE bit configurable so that EPT can use high available
> bits.
> 
> Of course, I forgot the above PAE restriction multiple times, and
> journeyed down several dead ends.  The most notable failed idea was
> using the AD_* masks in bits 52 and 53 to denote shadow-present SPTEs.
> That would have been quite clever as it would provide the same benefits
> without burning another available bit.
> 
> Along the many failed attempts, I collected a variety of bug fixes and
> cleanups, mostly things found by inspection after doing a deep dive to
> figure out what I broke.
> 
> Sean Christopherson (24):
>    KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is
>      enabled
>    KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status
>    KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not
>      shadow-present
>    KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF
>    KVM: x86/mmu: Retry page faults that hit an invalid memslot
>    KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is
>      disabled
>    KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte()
>    KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU
>    KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers
>    KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs
>    KVM: x86/mmu: Add module param to disable MMIO caching (for testing)
>    KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs
>    KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation
>    KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits
>    KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU
>      proper
>    KVM: x86/mmu: Co-locate code for setting various SPTE masks
>    KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU
>      proper
>    KVM: x86/mmu: Make Host-writable and MMU-writable bit locations
>      dynamic
>    KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs
>    KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
>    KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)
>    KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents
>    KVM: x86/mmu: Use low available bits for removed SPTEs
>    KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE
> 
>   Documentation/virt/kvm/locking.rst |  49 +++++----
>   arch/x86/include/asm/kvm_host.h    |   3 -
>   arch/x86/kvm/mmu.h                 |  15 +--
>   arch/x86/kvm/mmu/mmu.c             |  87 +++++++---------
>   arch/x86/kvm/mmu/mmu_internal.h    |  16 +--
>   arch/x86/kvm/mmu/paging_tmpl.h     |   2 +-
>   arch/x86/kvm/mmu/spte.c            | 157 ++++++++++++++++++++---------
>   arch/x86/kvm/mmu/spte.h            | 135 +++++++++++++++++--------
>   arch/x86/kvm/mmu/tdp_mmu.c         |  22 ++--
>   arch/x86/kvm/svm/svm.c             |   2 +-
>   arch/x86/kvm/vmx/vmx.c             |  24 +----
>   arch/x86/kvm/x86.c                 |   3 -
>   12 files changed, 290 insertions(+), 225 deletions(-)
> 

Queued (patch 1 for 5.12, the rest for 5.13).

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  2021-02-25 20:47 ` [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs Sean Christopherson
@ 2021-03-08 18:52   ` Tom Lendacky
  2021-03-08 19:48     ` Paolo Bonzini
  2021-03-08 20:11     ` Sean Christopherson
  0 siblings, 2 replies; 32+ messages in thread
From: Tom Lendacky @ 2021-03-08 18:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 2/25/21 2:47 PM, Sean Christopherson wrote:
> Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
> the MMU's perspective.  Checking for shadow-present SPTEs is a very
> common operation for the MMU, particularly in hot paths such as page
> faults.  With the addition of "removed" SPTEs for the TDP MMU,
> identifying shadow-present SPTEs is quite costly especially since it
> requires checking multiple 64-bit values.
> 
> On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
> On 32-bit KVM, this increases the footprint by ~200 bytes, but only
> because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/spte.c |  8 ++++----
>   arch/x86/kvm/mmu/spte.h | 11 ++++++++++-
>   2 files changed, 14 insertions(+), 5 deletions(-)

I'm trying to run a guest on my Rome system using the queue branch, but
I'm encountering an error that I bisected to this commit. In the guest
(during OVMF boot) I see:

error: kvm run failed Invalid argument
RAX=0000000000000000 RBX=00000000ffc12792 RCX=000000007f58401a RDX=000000007faaf808
RSI=0000000000000010 RDI=00000000ffc12792 RBP=00000000ffc12792 RSP=000000007faaf740
R8 =0000000000000792 R9 =000000007faaf808 R10=00000000ffc12793 R11=00000000000003f8
R12=0000000000000010 R13=0000000000000000 R14=000000007faaf808 R15=0000000000000012
RIP=000000007f6e9a90 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
GDT=     000000007f5ee698 00000047
IDT=     000000007f186018 00000fff
CR0=80010033 CR2=0000000000000000 CR3=000000007f801000 CR4=00000668
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d00
Code=22 00 00 e8 c0 e6 ff ff 48 83 c4 20 45 84 ed 74 07 fb eb 04 <44> 88 65 00 58 5b 5d 41 5c 41 5d c3 55 48 0f af 3d 1b 37 00 00 be 20 00 00 00 48 03 3d 17

On the hypervisor, I see the following:

[   55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, dump hierarchy:
[   55.895284] ------ spte 0x1344a0827 level 4.
[   55.900059] ------ spte 0x134499827 level 3.
[   55.904877] ------ spte 0x165bf0827 level 2.
[   55.909651] ------ spte 0xff800ffc12817 level 1.

When I kill the guest, I get a kernel panic:

[   95.539683] __pte_list_remove: 0000000040567a6a 0->BUG
[   95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!
[   95.551133] invalid opcode: 0000 [#1] SMP NOPTI
[   95.556192] CPU: 142 PID: 5054 Comm: qemu-system-x86 Tainted: G        W         5.11.0-rc4-sos-sev-es #1
[   95.566872] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
[   95.575900] RIP: 0010:__pte_list_remove.cold+0x2e/0x48 [kvm]
[   95.582312] Code: c7 c6 40 6f f3 c0 48 c7 c7 aa da f3 c0 e8 79 3d a7 cd 0f 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 87 da f3 c0 e8 61 3d a7 cd <0f> 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 98 da f3 c0 e8 49 3d
[   95.603271] RSP: 0018:ffffc900143e7c78 EFLAGS: 00010246
[   95.609093] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
[   95.617058] RDX: 0000000000000000 RSI: ffff88900e598950 RDI: ffff88900e598950
[   95.625019] RBP: ffff888165bf0090 R08: ffff88900e598950 R09: ffffc900143e7a98
[   95.632980] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000ff29000
[   95.640944] R13: ffffc900143e7d18 R14: 0000000000000098 R15: 0000000000000000
[   95.648912] FS:  0000000000000000(0000) GS:ffff88900e580000(0000) knlGS:0000000000000000
[   95.657951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.664361] CR2: 00007fb328d20c80 CR3: 00000001476d2000 CR4: 0000000000350ee0
[   95.672326] Call Trace:
[   95.675065]  mmu_page_zap_pte+0xf9/0x130 [kvm]
[   95.680103]  __kvm_mmu_prepare_zap_page+0x6d/0x380 [kvm]
[   95.686088]  kvm_mmu_zap_all+0x5e/0xe0 [kvm]
[   95.690911]  kvm_mmu_notifier_release+0x2b/0x60 [kvm]
[   95.696614]  __mmu_notifier_release+0x71/0x1e0
[   95.701585]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   95.707512]  ? __khugepaged_exit+0x111/0x160
[   95.712289]  exit_mmap+0x15b/0x1f0
[   95.716092]  ? __khugepaged_exit+0x111/0x160
[   95.720857]  ? kmem_cache_free+0x210/0x3f0
[   95.725428]  ? kmem_cache_free+0x387/0x3f0
[   95.729998]  mmput+0x56/0x130
[   95.733312]  do_exit+0x341/0xb50
[   95.736923]  do_group_exit+0x3a/0xa0
[   95.740925]  __x64_sys_exit_group+0x14/0x20
[   95.745600]  do_syscall_64+0x33/0x40
[   95.749601]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   95.755241] RIP: 0033:0x7fb333a442c6
[   95.759231] Code: Unable to access opcode bytes at RIP 0x7fb333a4429c.
[   95.766514] RSP: 002b:00007ffdf675cad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   95.774962] RAX: ffffffffffffffda RBX: 00007fb333b4b610 RCX: 00007fb333a442c6
[   95.782925] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[   95.790892] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffdc38
[   95.798856] R10: 00007fb32945cf80 R11: 0000000000000246 R12: 00007fb333b4b610
[   95.806825] R13: 000000000000034c R14: 00007fb333b4efc8 R15: 0000000000000000
[   95.814803] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc intel_rapl_msr wmi_bmof intel_rapl_common amd64_edac_mod edac_mce_amd fuse kvm_amd kvm irqby
pass ipmi_ssif sg ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel parport_pc ppdev lp parport sunrpc ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xo
r async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper i2c_algo_bit drm_ttm_helper 
ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops libahci libata drm i2c_designware_platform e1000e i2c_piix4 wmi i2c_designware_core pinctrl_amd i2c_core
[   95.893646] ---[ end trace f40aac7ee7919c14 ]---
[   95.898848] RIP: 0010:__pte_list_remove.cold+0x2e/0x48 [kvm]
[   95.905258] Code: c7 c6 40 6f f3 c0 48 c7 c7 aa da f3 c0 e8 79 3d a7 cd 0f 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 87 da f3 c0 e8 61 3d a7 cd <0f> 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 98 da f3 c0 e8 49 3d
[   95.926234] RSP: 0018:ffffc900143e7c78 EFLAGS: 00010246
[   95.932109] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
[   95.940087] RDX: 0000000000000000 RSI: ffff88900e598950 RDI: ffff88900e598950
[   95.948086] RBP: ffff888165bf0090 R08: ffff88900e598950 R09: ffffc900143e7a98
[   95.956068] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000ff29000
[   95.964051] R13: ffffc900143e7d18 R14: 0000000000000098 R15: 0000000000000000
[   95.972031] FS:  0000000000000000(0000) GS:ffff88900e580000(0000) knlGS:0000000000000000
[   95.981081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.987510] CR2: 00007fb328d20c80 CR3: 00000001476d2000 CR4: 0000000000350ee0
[   95.995492] Kernel panic - not syncing: Fatal exception
[   96.008273] Kernel Offset: disabled
[   96.012249] ---[ end Kernel panic - not syncing: Fatal exception ]---

Let me know if there's anything you want me to try.

Thanks,
Tom

> 
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index d12acf5eb871..e07aabb23b8a 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -94,7 +94,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
>   		     bool can_unsync, bool host_writable, bool ad_disabled,
>   		     u64 *new_spte)
>   {
> -	u64 spte = 0;
> +	u64 spte = SPTE_MMU_PRESENT_MASK;
>   	int ret = 0;
>   
>   	if (ad_disabled)
> @@ -183,10 +183,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
>   
>   u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
>   {
> -	u64 spte;
> +	u64 spte = SPTE_MMU_PRESENT_MASK;
>   
> -	spte = __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
> -	       shadow_user_mask | shadow_x_mask | shadow_me_mask;
> +	spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
> +		shadow_user_mask | shadow_x_mask | shadow_me_mask;
>   
>   	if (ad_disabled)
>   		spte |= SPTE_TDP_AD_DISABLED_MASK;
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 8996baa8da15..645e9bc2d4a2 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -5,6 +5,15 @@
>   
>   #include "mmu_internal.h"
>   
> +/*
> + * A MMU present SPTE is backed by actual memory and may or may not be present
> + * in hardware.  E.g. MMIO SPTEs are not considered present.  Use bit 11, as it
> + * is ignored by all flavors of SPTEs and checking a low bit often generates
> + * better code than for a high bit, e.g. 56+.  MMU present checks are pervasive
> + * enough that the improved code generation is noticeable in KVM's footprint.
> + */
> +#define SPTE_MMU_PRESENT_MASK		BIT_ULL(11)
> +
>   /*
>    * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
>    * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
> @@ -241,7 +250,7 @@ static inline bool is_access_track_spte(u64 spte)
>   
>   static inline bool is_shadow_present_pte(u64 pte)
>   {
> -	return (pte != 0) && !is_mmio_spte(pte) && !is_removed_spte(pte);
> +	return !!(pte & SPTE_MMU_PRESENT_MASK);
>   }
>   
>   static inline bool is_large_pte(u64 pte)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  2021-03-08 18:52   ` Tom Lendacky
@ 2021-03-08 19:48     ` Paolo Bonzini
  2021-03-08 20:11     ` Sean Christopherson
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-03-08 19:48 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 08/03/21 19:52, Tom Lendacky wrote:
> On 2/25/21 2:47 PM, Sean Christopherson wrote:
>> Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
>> the MMU's perspective.  Checking for shadow-present SPTEs is a very
>> common operation for the MMU, particularly in hot paths such as page
>> faults.  With the addition of "removed" SPTEs for the TDP MMU,
>> identifying shadow-present SPTEs is quite costly especially since it
>> requires checking multiple 64-bit values.
>>
>> On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
>> On 32-bit KVM, this increases the footprint by ~200 bytes, but only
>> because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().
>>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>    arch/x86/kvm/mmu/spte.c |  8 ++++----
>>    arch/x86/kvm/mmu/spte.h | 11 ++++++++++-
>>    2 files changed, 14 insertions(+), 5 deletions(-)
> 
> I'm trying to run a guest on my Rome system using the queue branch, but
> I'm encountering an error that I bisected to this commit. In the guest
> (during OVMF boot) I see:
> 
> error: kvm run failed Invalid argument
> RAX=0000000000000000 RBX=00000000ffc12792 RCX=000000007f58401a RDX=000000007faaf808
> RSI=0000000000000010 RDI=00000000ffc12792 RBP=00000000ffc12792 RSP=000000007faaf740
> R8 =0000000000000792 R9 =000000007faaf808 R10=00000000ffc12793 R11=00000000000003f8
> R12=0000000000000010 R13=0000000000000000 R14=000000007faaf808 R15=0000000000000012
> RIP=000000007f6e9a90 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f5ee698 00000047
> IDT=     000000007f186018 00000fff
> CR0=80010033 CR2=0000000000000000 CR3=000000007f801000 CR4=00000668
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000d00
> Code=22 00 00 e8 c0 e6 ff ff 48 83 c4 20 45 84 ed 74 07 fb eb 04 <44> 88 65 00 58 5b 5d 41 5c 41 5d c3 55 48 0f af 3d 1b 37 00 00 be 20 00 00 00 48 03 3d 17
> 
> On the hypervisor, I see the following:
> 
> [   55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, dump hierarchy:
> [   55.895284] ------ spte 0x1344a0827 level 4.
> [   55.900059] ------ spte 0x134499827 level 3.
> [   55.904877] ------ spte 0x165bf0827 level 2.
> [   55.909651] ------ spte 0xff800ffc12817 level 1.
> 
> When I kill the guest, I get a kernel panic:
> 
> [   95.539683] __pte_list_remove: 0000000040567a6a 0->BUG
> [   95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!
> [   95.551133] invalid opcode: 0000 [#1] SMP NOPTI
> [   95.556192] CPU: 142 PID: 5054 Comm: qemu-system-x86 Tainted: G        W         5.11.0-rc4-sos-sev-es #1
> [   95.566872] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS REX1006G 01/25/2020
> [   95.575900] RIP: 0010:__pte_list_remove.cold+0x2e/0x48 [kvm]
> [   95.582312] Code: c7 c6 40 6f f3 c0 48 c7 c7 aa da f3 c0 e8 79 3d a7 cd 0f 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 87 da f3 c0 e8 61 3d a7 cd <0f> 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 98 da f3 c0 e8 49 3d
> [   95.603271] RSP: 0018:ffffc900143e7c78 EFLAGS: 00010246
> [   95.609093] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
> [   95.617058] RDX: 0000000000000000 RSI: ffff88900e598950 RDI: ffff88900e598950
> [   95.625019] RBP: ffff888165bf0090 R08: ffff88900e598950 R09: ffffc900143e7a98
> [   95.632980] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000ff29000
> [   95.640944] R13: ffffc900143e7d18 R14: 0000000000000098 R15: 0000000000000000
> [   95.648912] FS:  0000000000000000(0000) GS:ffff88900e580000(0000) knlGS:0000000000000000
> [   95.657951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   95.664361] CR2: 00007fb328d20c80 CR3: 00000001476d2000 CR4: 0000000000350ee0
> [   95.672326] Call Trace:
> [   95.675065]  mmu_page_zap_pte+0xf9/0x130 [kvm]
> [   95.680103]  __kvm_mmu_prepare_zap_page+0x6d/0x380 [kvm]
> [   95.686088]  kvm_mmu_zap_all+0x5e/0xe0 [kvm]
> [   95.690911]  kvm_mmu_notifier_release+0x2b/0x60 [kvm]
> [   95.696614]  __mmu_notifier_release+0x71/0x1e0
> [   95.701585]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> [   95.707512]  ? __khugepaged_exit+0x111/0x160
> [   95.712289]  exit_mmap+0x15b/0x1f0
> [   95.716092]  ? __khugepaged_exit+0x111/0x160
> [   95.720857]  ? kmem_cache_free+0x210/0x3f0
> [   95.725428]  ? kmem_cache_free+0x387/0x3f0
> [   95.729998]  mmput+0x56/0x130
> [   95.733312]  do_exit+0x341/0xb50
> [   95.736923]  do_group_exit+0x3a/0xa0
> [   95.740925]  __x64_sys_exit_group+0x14/0x20
> [   95.745600]  do_syscall_64+0x33/0x40
> [   95.749601]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   95.755241] RIP: 0033:0x7fb333a442c6
> [   95.759231] Code: Unable to access opcode bytes at RIP 0x7fb333a4429c.
> [   95.766514] RSP: 002b:00007ffdf675cad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [   95.774962] RAX: ffffffffffffffda RBX: 00007fb333b4b610 RCX: 00007fb333a442c6
> [   95.782925] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> [   95.790892] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffdc38
> [   95.798856] R10: 00007fb32945cf80 R11: 0000000000000246 R12: 00007fb333b4b610
> [   95.806825] R13: 000000000000034c R14: 00007fb333b4efc8 R15: 0000000000000000
> [   95.814803] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc intel_rapl_msr wmi_bmof intel_rapl_common amd64_edac_mod edac_mce_amd fuse kvm_amd kvm irqby
> pass ipmi_ssif sg ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq sch_fq_codel parport_pc ppdev lp parport sunrpc ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xo
> r async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper i2c_algo_bit drm_ttm_helper
> ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops libahci libata drm i2c_designware_platform e1000e i2c_piix4 wmi i2c_designware_core pinctrl_amd i2c_core
> [   95.893646] ---[ end trace f40aac7ee7919c14 ]---
> [   95.898848] RIP: 0010:__pte_list_remove.cold+0x2e/0x48 [kvm]
> [   95.905258] Code: c7 c6 40 6f f3 c0 48 c7 c7 aa da f3 c0 e8 79 3d a7 cd 0f 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 87 da f3 c0 e8 61 3d a7 cd <0f> 0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 98 da f3 c0 e8 49 3d
> [   95.926234] RSP: 0018:ffffc900143e7c78 EFLAGS: 00010246
> [   95.932109] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
> [   95.940087] RDX: 0000000000000000 RSI: ffff88900e598950 RDI: ffff88900e598950
> [   95.948086] RBP: ffff888165bf0090 R08: ffff88900e598950 R09: ffffc900143e7a98
> [   95.956068] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000ff29000
> [   95.964051] R13: ffffc900143e7d18 R14: 0000000000000098 R15: 0000000000000000
> [   95.972031] FS:  0000000000000000(0000) GS:ffff88900e580000(0000) knlGS:0000000000000000
> [   95.981081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   95.987510] CR2: 00007fb328d20c80 CR3: 00000001476d2000 CR4: 0000000000350ee0
> [   95.995492] Kernel panic - not syncing: Fatal exception
> [   96.008273] Kernel Offset: disabled
> [   96.012249] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> Let me know if there's anything you want me to try.

I can confirm that we're seeing the same failure, though we hadn't 
bisected it yet.

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  2021-03-08 18:52   ` Tom Lendacky
  2021-03-08 19:48     ` Paolo Bonzini
@ 2021-03-08 20:11     ` Sean Christopherson
  2021-03-08 21:49       ` Sean Christopherson
  1 sibling, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2021-03-08 20:11 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

On Mon, Mar 08, 2021, Tom Lendacky wrote:
> On 2/25/21 2:47 PM, Sean Christopherson wrote:
> > Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
> > the MMU's perspective.  Checking for shadow-present SPTEs is a very
> > common operation for the MMU, particularly in hot paths such as page
> > faults.  With the addition of "removed" SPTEs for the TDP MMU,
> > identifying shadow-present SPTEs is quite costly especially since it
> > requires checking multiple 64-bit values.
> > 
> > On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
> > On 32-bit KVM, this increases the footprint by ~200 bytes, but only
> > because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/spte.c |  8 ++++----
> >   arch/x86/kvm/mmu/spte.h | 11 ++++++++++-
> >   2 files changed, 14 insertions(+), 5 deletions(-)
> 
> I'm trying to run a guest on my Rome system using the queue branch, but
> I'm encountering an error that I bisected to this commit. In the guest
> (during OVMF boot) I see:
> 
> error: kvm run failed Invalid argument
> RAX=0000000000000000 RBX=00000000ffc12792 RCX=000000007f58401a RDX=000000007faaf808
> RSI=0000000000000010 RDI=00000000ffc12792 RBP=00000000ffc12792 RSP=000000007faaf740
> R8 =0000000000000792 R9 =000000007faaf808 R10=00000000ffc12793 R11=00000000000003f8
> R12=0000000000000010 R13=0000000000000000 R14=000000007faaf808 R15=0000000000000012
> RIP=000000007f6e9a90 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
> SS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0030 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
> GDT=     000000007f5ee698 00000047
> IDT=     000000007f186018 00000fff
> CR0=80010033 CR2=0000000000000000 CR3=000000007f801000 CR4=00000668
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000d00
> Code=22 00 00 e8 c0 e6 ff ff 48 83 c4 20 45 84 ed 74 07 fb eb 04 <44> 88 65 00 58 5b 5d 41 5c 41 5d c3 55 48 0f af 3d 1b 37 00 00 be 20 00 00 00 48 03 3d 17
> 
> On the hypervisor, I see the following:
> 
> [   55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, dump hierarchy:
> [   55.895284] ------ spte 0x1344a0827 level 4.
> [   55.900059] ------ spte 0x134499827 level 3.
> [   55.904877] ------ spte 0x165bf0827 level 2.
> [   55.909651] ------ spte 0xff800ffc12817 level 1.

Ah fudge.  I know what's wrong.  The MMIO generation uses bit 11, which means
is_shadow_present_pte() can get a false positive on high MMIO generations.  This
particular error can be squashed by explicitly checking for MMIO sptes in
get_mmio_spte(), but I'm guessing there are other flows where a false positive
is fatal (probably the __pte_list_remove bug below...).  The safe thing to do is
to steal bit 11 from MMIO SPTEs so that shadow present PTEs are the only thing
that sets that bit.

I'll reproduce by stuffing the MMIO generation and get a patch posted.  Sorry :-/

> When I kill the guest, I get a kernel panic:
> 
> [   95.539683] __pte_list_remove: 0000000040567a6a 0->BUG
> [   95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs
  2021-03-08 20:11     ` Sean Christopherson
@ 2021-03-08 21:49       ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2021-03-08 21:49 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

On Mon, Mar 08, 2021, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Tom Lendacky wrote:
> > On the hypervisor, I see the following:
> > 
> > [   55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, dump hierarchy:
> > [   55.895284] ------ spte 0x1344a0827 level 4.
> > [   55.900059] ------ spte 0x134499827 level 3.
> > [   55.904877] ------ spte 0x165bf0827 level 2.
> > [   55.909651] ------ spte 0xff800ffc12817 level 1.
> 
> Ah fudge.  I know what's wrong.  The MMIO generation uses bit 11, which means
> is_shadow_present_pte() can get a false positive on high MMIO generations.  This
> particular error can be squashed by explicitly checking for MMIO sptes in
> get_mmio_spte(), but I'm guessing there are other flows where a false positive
> is fatal (probably the __pte_list_remove bug below...).  The safe thing to do is
> to steal bit 11 from MMIO SPTEs so that shadow present PTEs are the only thing
> that sets that bit.
> 
> I'll reproduce by stuffing the MMIO generation and get a patch posted.  Sorry :-/
> 
> > When I kill the guest, I get a kernel panic:
> > 
> > [   95.539683] __pte_list_remove: 0000000040567a6a 0->BUG
> > [   95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!

Fudging get_mmio_spte() did allow the guest to boot, but as expected KVM panicked
during guest shutdown.  Initial testing on the below changes look good, I'll
test more thoroughly and (hopefully) post later today.

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index b53036d9ddf3..e6e683e0fdcd 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
 #undef SHADOW_ACC_TRACK_SAVED_MASK

 /*
- * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
+ * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
  * the memslots generation and is derived as follows:
  *
- * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
- * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
+ * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
+ * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
  *
  * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
  * the MMIO generation number, as doing so would require stealing a bit from
@@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
  */

 #define MMIO_SPTE_GEN_LOW_START                3
-#define MMIO_SPTE_GEN_LOW_END          11
+#define MMIO_SPTE_GEN_LOW_END          10

 #define MMIO_SPTE_GEN_HIGH_START       52
 #define MMIO_SPTE_GEN_HIGH_END         62
@@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
                                                    MMIO_SPTE_GEN_LOW_START)
 #define MMIO_SPTE_GEN_HIGH_MASK                GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
                                                    MMIO_SPTE_GEN_HIGH_START)
+static_assert(!(SPTE_MMU_PRESENT_MASK &
+               (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));

 #define MMIO_SPTE_GEN_LOW_BITS         (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1)
 #define MMIO_SPTE_GEN_HIGH_BITS                (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)

 /* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
+static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);

 #define MMIO_SPTE_GEN_LOW_SHIFT                (MMIO_SPTE_GEN_LOW_START - 0)
 #define MMIO_SPTE_GEN_HIGH_SHIFT       (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)
  2021-02-25 20:47 ` [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO) Sean Christopherson
@ 2021-05-06 23:05   ` Matteo Croce
  2021-05-07  7:38     ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Matteo Croce @ 2021-05-06 23:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

On Thu, 25 Feb 2021 12:47:46 -0800
Sean Christopherson <seanjc@google.com> wrote:

> Tweak the MMU_WARN that guards against weirdness when querying A/D
> status to fire on a !MMU_PRESENT SPTE, as opposed to a MMIO SPTE.
> Attempting to query A/D status on any kind of !MMU_PRESENT SPTE, MMIO
> or otherwise, indicates a KVM bug.  Case in point, several now-fixed
> bugs were identified by enabling this new WARN.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

You made the 1.000.000th commit, congrats!

$ git log --oneline --reverse |sed '1000000!d'
8f366ae6d8c5 KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)

Cheers,
-- 
per aspera ad upstream

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)
  2021-05-06 23:05   ` Matteo Croce
@ 2021-05-07  7:38     ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2021-05-07  7:38 UTC (permalink / raw)
  To: Matteo Croce, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 07/05/21 01:05, Matteo Croce wrote:
> On Thu, 25 Feb 2021 12:47:46 -0800
> Sean Christopherson <seanjc@google.com> wrote:
> 
>> Tweak the MMU_WARN that guards against weirdness when querying A/D
>> status to fire on a !MMU_PRESENT SPTE, as opposed to a MMIO SPTE.
>> Attempting to query A/D status on any kind of !MMU_PRESENT SPTE, MMIO
>> or otherwise, indicates a KVM bug.  Case in point, several now-fixed
>> bugs were identified by enabling this new WARN.
>>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> You made the 1.000.000th commit, congrats!
> 
> $ git log --oneline --reverse |sed '1000000!d'
> 8f366ae6d8c5 KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO)

🦀🦀🦀

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-05-07  7:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25 20:47 [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Sean Christopherson
2021-02-25 20:47 ` [PATCH 01/24] KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled Sean Christopherson
2021-02-25 20:47 ` [PATCH 02/24] KVM: x86/mmu: Check for shadow-present SPTE before querying A/D status Sean Christopherson
2021-02-25 20:47 ` [PATCH 03/24] KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present Sean Christopherson
2021-02-25 20:47 ` [PATCH 04/24] KVM: x86/mmu: Disable MMIO caching if MMIO value collides with L1TF Sean Christopherson
2021-02-25 20:47 ` [PATCH 05/24] KVM: x86/mmu: Retry page faults that hit an invalid memslot Sean Christopherson
2021-02-25 20:47 ` [PATCH 06/24] KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled Sean Christopherson
2021-02-25 20:47 ` [PATCH 07/24] KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte() Sean Christopherson
2021-02-25 20:47 ` [PATCH 08/24] KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU Sean Christopherson
2021-02-25 20:47 ` [PATCH 09/24] KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers Sean Christopherson
2021-02-25 20:47 ` [PATCH 10/24] KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs Sean Christopherson
2021-02-25 20:47 ` [PATCH 11/24] KVM: x86/mmu: Add module param to disable MMIO caching (for testing) Sean Christopherson
2021-02-25 20:47 ` [PATCH 12/24] KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs Sean Christopherson
2021-02-25 20:47 ` [PATCH 13/24] KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation Sean Christopherson
2021-02-25 20:47 ` [PATCH 14/24] KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits Sean Christopherson
2021-02-25 20:47 ` [PATCH 15/24] KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper Sean Christopherson
2021-02-25 20:47 ` [PATCH 16/24] KVM: x86/mmu: Co-locate code for setting various SPTE masks Sean Christopherson
2021-02-25 20:47 ` [PATCH 17/24] KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper Sean Christopherson
2021-02-25 20:47 ` [PATCH 18/24] KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic Sean Christopherson
2021-02-25 20:47 ` [PATCH 19/24] KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs Sean Christopherson
2021-02-25 20:47 ` [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs Sean Christopherson
2021-03-08 18:52   ` Tom Lendacky
2021-03-08 19:48     ` Paolo Bonzini
2021-03-08 20:11     ` Sean Christopherson
2021-03-08 21:49       ` Sean Christopherson
2021-02-25 20:47 ` [PATCH 21/24] KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO) Sean Christopherson
2021-05-06 23:05   ` Matteo Croce
2021-05-07  7:38     ` Paolo Bonzini
2021-02-25 20:47 ` [PATCH 22/24] KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents Sean Christopherson
2021-02-25 20:47 ` [PATCH 23/24] KVM: x86/mmu: Use low available bits for removed SPTEs Sean Christopherson
2021-02-25 20:47 ` [PATCH 24/24] KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE Sean Christopherson
2021-02-26  9:38 ` [PATCH 00/24] KVM: x86/mmu: Introduce MMU_PRESENT and fix bugs Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).