kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU
@ 2022-01-13 23:30 David Matlack
  2022-01-13 23:30 ` [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: David Matlack @ 2022-01-13 23:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

While attempting to understand the big comment in
kvm_mmu_slot_remove_write_access() about TLB flushing, I discovered a
bug in the way the TDP MMU write-protects GFNs. I have not managed to
reproduce the bug as it requires a rather complex set up of live
migrating a VM that is using nested virtualization while the TDP MMU is
enabled.

Patch 1 fixes the bug and is CC'd to stable.
Patch 2-3 fix, document, and enforce invariants around MMU-writable
and Host-writable bits.
Patch 4 fixes up the aformentioned comment to be more readable.

Tested using the kvm-unit-tests and KVM selftests.

v2:
 - Skip setting the SPTE when MMU-writable is already clear [Sean]
 - Add patches for {MMU,Host}-writable invariants [Sean]
 - Fix inaccuracies in kvm_mmu_slot_remove_write_access() comment [Sean]

v1: https://lore.kernel.org/kvm/20220112215801.3502286-1-dmatlack@google.com/

David Matlack (4):
  KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  KVM: x86/mmu: Clear MMU-writable during changed_pte notifier
  KVM: x86/mmu: Document and enforce MMU-writable and Host-writable
    invariants
  KVM: x86/mmu: Improve TLB flush comment in
    kvm_mmu_slot_remove_write_access()

 arch/x86/kvm/mmu/mmu.c     | 31 ++++++++++++++++++++--------
 arch/x86/kvm/mmu/spte.c    |  1 +
 arch/x86/kvm/mmu/spte.h    | 42 ++++++++++++++++++++++++++++++++------
 arch/x86/kvm/mmu/tdp_mmu.c |  6 +++---
 4 files changed, 62 insertions(+), 18 deletions(-)


base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
@ 2022-01-13 23:30 ` David Matlack
  2022-01-14 23:38   ` Sean Christopherson
  2022-01-13 23:30 ` [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier David Matlack
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: David Matlack @ 2022-01-13 23:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack, stable

When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.

This behavior is incorrect because the SPTE may be write-protected for
dirty logging. This implies that the SPTE could be locklessly be made
writable on the next write access, and that vCPUs could still be running
with writable SPTEs cached in their TLB.

Fix this by only skipping setting the SPTE if the SPTE is already
write-protected *and* MMU-writable is already clear.

Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1bc816b7c3..bc9e3553fba2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1442,12 +1442,12 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		if (!is_writable_pte(iter.old_spte))
-			break;
-
 		new_spte = iter.old_spte &
 			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
 
+		if (new_spte == iter.old_spte)
+			break;
+
 		tdp_mmu_set_spte(kvm, &iter, new_spte);
 		spte_set = true;
 	}

base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier
  2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
  2022-01-13 23:30 ` [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
@ 2022-01-13 23:30 ` David Matlack
  2022-01-14 23:41   ` Sean Christopherson
  2022-01-13 23:30 ` [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants David Matlack
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: David Matlack @ 2022-01-13 23:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

When handling the changed_pte notifier and the new PTE is read-only,
clear both the Host-writable and MMU-writable bits in the SPTE. This
preserves the invariant that MMU-writable is set if-and-only-if
Host-writable is set.

No functional change intended. Nothing currently relies on the
afformentioned invariant and technically the changed_pte notifier is
dead code.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 8a7b03207762..f8677404c93c 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -215,6 +215,7 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
 
 	new_spte &= ~PT_WRITABLE_MASK;
 	new_spte &= ~shadow_host_writable_mask;
+	new_spte &= ~shadow_mmu_writable_mask;
 
 	new_spte = mark_spte_for_access_track(new_spte);
 
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants
  2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
  2022-01-13 23:30 ` [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
  2022-01-13 23:30 ` [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier David Matlack
@ 2022-01-13 23:30 ` David Matlack
  2022-01-14 22:29   ` Sean Christopherson
  2022-01-13 23:30 ` [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() David Matlack
  2022-01-17 17:59 ` [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU Paolo Bonzini
  4 siblings, 1 reply; 13+ messages in thread
From: David Matlack @ 2022-01-13 23:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

SPTEs are tagged with software-only bits to indicate if it is
"MMU-writable" and "Host-writable". These bits are used to determine why
KVM has marked an SPTE as read-only.

Document these bits and their invariants, and enforce the invariants
with new WARNs in spte_can_locklessly_be_made_writable() to ensure they
are not accidentally violated in the future.

Opportunistically move DEFAULT_SPTE_{MMU,HOST}_WRITABLE next to
EPT_SPTE_{MMU,HOST}_WRITABLE since the new documentation applies to
both.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/spte.h | 42 +++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index a4af2a42695c..be6a007a4af3 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -60,10 +60,6 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 	(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
-/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
-#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
-#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)
-
 /*
  * The mask/shift to use for saving the original R/X bits when marking the PTE
  * as not-present for access tracking purposes. We do not save the W bit as the
@@ -78,6 +74,35 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0);
 					 SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
 static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
 
+/*
+ * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
+ * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
+ * that map guest pages in read-only memslots and read-only VMAs.
+ *
+ * Invariants:
+ *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
+ *
+ *
+ * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
+ * allows writes to the guest page mapped by the SPTE. This bit is cleared when
+ * the guest page mapped by the SPTE contains a page table that is being
+ * monitored for shadow paging. In this case the SPTE can only be made writable
+ * by unsyncing the shadow page under the mmu_lock.
+ *
+ * Invariants:
+ *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
+ *  - If MMU-writable is set, Host-writable must be set.
+ *
+ * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
+ * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
+ * PT_WRITABLE_MASK upon the next write from the guest and record the write in
+ * the dirty log (see fast_page_fault()).
+ */
+
+/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
+#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
+#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)
+
 /*
  * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
  * to not overlap the A/D type mask or the saved access bits of access-tracked
@@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
 
 static inline bool spte_can_locklessly_be_made_writable(u64 spte)
 {
-	return (spte & shadow_host_writable_mask) &&
-	       (spte & shadow_mmu_writable_mask);
+	if (spte & shadow_mmu_writable_mask) {
+		WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
+		return true;
+	}
+
+	WARN_ON_ONCE(spte & PT_WRITABLE_MASK);
+	return false;
 }
 
 static inline u64 get_mmio_spte_generation(u64 spte)
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access()
  2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
                   ` (2 preceding siblings ...)
  2022-01-13 23:30 ` [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants David Matlack
@ 2022-01-13 23:30 ` David Matlack
  2022-01-14 23:58   ` Sean Christopherson
  2022-01-17 17:59 ` [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU Paolo Bonzini
  4 siblings, 1 reply; 13+ messages in thread
From: David Matlack @ 2022-01-13 23:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
why it is safe to flush TLBs outside of the MMU lock after
write-protecting SPTEs for dirty logging. The current comment is a long
run-on sentence that was difficult to understand. In addition it was
specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
MMU has to handle this as well.

The new comment explains:
 - Why the TLB flush is necessary at all.
 - Why it is desirable to do the TLB flush outside of the MMU lock.
 - Why it is safe to do the TLB flush outside of the MMU lock.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1d275e9d76b5..8ed2b42a7aa3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5756,6 +5756,7 @@ static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 				continue;
 
 			flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+
 							PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
 							start, end - 1, true, flush);
 		}
@@ -5825,15 +5826,27 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	}
 
 	/*
-	 * We can flush all the TLBs out of the mmu lock without TLB
-	 * corruption since we just change the spte from writable to
-	 * readonly so that we only need to care the case of changing
-	 * spte from present to present (changing the spte from present
-	 * to nonpresent will flush all the TLBs immediately), in other
-	 * words, the only case we care is mmu_spte_update() where we
-	 * have checked Host-writable | MMU-writable instead of
-	 * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
-	 * anymore.
+	 * Flush TLBs if any SPTEs had to be write-protected to ensure that
+	 * guest writes are reflected in the dirty bitmap before the memslot
+	 * update completes, i.e. before enabling dirty logging is visible to
+	 * userspace.
+	 *
+	 * Perform the TLB flush outside the mmu_lock to reduce the amount of
+	 * time the lock is held. However, this does mean that another CPU can
+	 * now grab the mmu_lock and encounter an SPTE that is write-protected
+	 * while CPUs still have writable versions of that SPTE in their TLB.
+	 *
+	 * This is safe but requires KVM to be careful when making decisions
+	 * based on the write-protection status of an SPTE. Specifically, KVM
+	 * also write-protects SPTEs to monitor changes to guest page tables
+	 * during shadow paging, and must guarantee no CPUs can write to those
+	 * page before the lock is dropped. As mentioned in the previous
+	 * paragraph, a write-protected SPTE is no guarantee that CPU cannot
+	 * perform writes. So to determine if a TLB flush is truly required, KVM
+	 * will clear a separate software-only bit (MMU-writable) and skip the
+	 * flush if-and-only-if this bit was already clear.
+	 *
+	 * See DEFAULT_SPTE_MMU_WRITEABLE for more details.
 	 */
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants
  2022-01-13 23:30 ` [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants David Matlack
@ 2022-01-14 22:29   ` Sean Christopherson
  2022-01-18 17:45     ` David Matlack
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2022-01-14 22:29 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm

On Thu, Jan 13, 2022, David Matlack wrote:
> +/*
> + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
> + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
> + * that map guest pages in read-only memslots and read-only VMAs.
> + *
> + * Invariants:
> + *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
> + *
> + *
> + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
> + * allows writes to the guest page mapped by the SPTE. This bit is cleared when
> + * the guest page mapped by the SPTE contains a page table that is being
> + * monitored for shadow paging. In this case the SPTE can only be made writable
> + * by unsyncing the shadow page under the mmu_lock.
> + *
> + * Invariants:
> + *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
> + *  - If MMU-writable is set, Host-writable must be set.
> + *
> + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
> + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
> + * PT_WRITABLE_MASK upon the next write from the guest and record the write in
> + * the dirty log (see fast_page_fault()).
> + */
> +
> +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
> +#define DEFAULT_SPTE_HOST_WRITEABLE	BIT_ULL(9)
> +#define DEFAULT_SPTE_MMU_WRITEABLE	BIT_ULL(10)

Ha, so there's a massive comment above is_writable_pte() that covers a lot of
the same material.  More below.

> +
>  /*
>   * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
>   * to not overlap the A/D type mask or the saved access bits of access-tracked
> @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
>  
>  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
>  {
> -	return (spte & shadow_host_writable_mask) &&
> -	       (spte & shadow_mmu_writable_mask);
> +	if (spte & shadow_mmu_writable_mask) {
> +		WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
> +		return true;
> +	}
> +
> +	WARN_ON_ONCE(spte & PT_WRITABLE_MASK);

I don't like having the WARNs here.  This is a moderately hot path, there are a
decent number of call sites, and the WARNs won't actually help detect the offender,
i.e. whoever wrote the bad SPTE long since got away.

And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-)

Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track().

> +	return false;

To kill a few birds with fewer stones, what if we:

  a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE
     definitions.

  b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE
     is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's
     HOST-Writable.

  c. Drop the WARN in mark_spte_for_access_track().

  d. Call spte_check_writable_invariants() when setting SPTEs.

  e. Document everything in a comment above spte_check_writable_invariants().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-13 23:30 ` [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
@ 2022-01-14 23:38   ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-01-14 23:38 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, stable

On Thu, Jan 13, 2022, David Matlack wrote:
> When the TDP MMU is write-protection GFNs for page table protection (as
                      ^^^^^^^^^^^^^^^^
                      write-protecting

> opposed to for dirty logging, or due to the HVA not being writable), it
> checks if the SPTE is already write-protected and if so skips modifying
> the SPTE and the TLB flush.
> 
> This behavior is incorrect because the SPTE may be write-protected for
> dirty logging. This implies that the SPTE could be locklessly be made

Spurious "be" between could and locklessly.

Hmm, it doesn't imply anything, the behavior of MMU-writable is quite explicit.
If the bug occurs, then _that_ implies the SPTE was write-protected for dirty
logging, otherwise MMU-Writable would have been '0' due to HOST-Writable also
being '0'.

I think what you're trying to say is:

  This behavior is incorrect because it fails to check if the SPTE is
  write-protected for page table protection, i.e. fails to check that
  MMU-writable is '0'.  If the SPTE was write-protected for dirty logging
  but not page table protection, the SPTE could locklessly be made
  writable, and vCPUs could still be running with writable mappings
  cached in their TLB.

> writable on the next write access, and that vCPUs could still be running
> with writable SPTEs cached in their TLB.

Nit, it's technically the mapping, not the SPTE itself, that's cached in the TLB.

> Fix this by only skipping setting the SPTE if the SPTE is already
> write-protected *and* MMU-writable is already clear.

Might also be worth adding:

  Note, technically checking only MMU-writable would suffice, as a SPTE
  cannot be writable without MMU-writable being set, but check both to
  be paranoid and because it arguably yields more readable code.

Pedantry aside,

Reviewed-by: Sean Christopherson <seanjc@google.com>

> Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
> Cc: stable@vger.kernel.org
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7b1bc816b7c3..bc9e3553fba2 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1442,12 +1442,12 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
>  		    !is_last_spte(iter.old_spte, iter.level))
>  			continue;
>  
> -		if (!is_writable_pte(iter.old_spte))
> -			break;
> -
>  		new_spte = iter.old_spte &
>  			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
>  
> +		if (new_spte == iter.old_spte)
> +			break;
> +
>  		tdp_mmu_set_spte(kvm, &iter, new_spte);
>  		spte_set = true;
>  	}
> 
> base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
> -- 
> 2.34.1.703.g22d0c6ccf7-goog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier
  2022-01-13 23:30 ` [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier David Matlack
@ 2022-01-14 23:41   ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-01-14 23:41 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm

On Thu, Jan 13, 2022, David Matlack wrote:
> When handling the changed_pte notifier and the new PTE is read-only,
> clear both the Host-writable and MMU-writable bits in the SPTE. This
> preserves the invariant that MMU-writable is set if-and-only-if
> Host-writable is set.
> 
> No functional change intended. Nothing currently relies on the
> afformentioned invariant and technically the changed_pte notifier is
> dead code.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access()
  2022-01-13 23:30 ` [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() David Matlack
@ 2022-01-14 23:58   ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-01-14 23:58 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm

On Thu, Jan 13, 2022, David Matlack wrote:
> Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
> why it is safe to flush TLBs outside of the MMU lock after
> write-protecting SPTEs for dirty logging. The current comment is a long
> run-on sentence that was difficult to understand. In addition it was
> specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
> MMU has to handle this as well.
> 
> The new comment explains:
>  - Why the TLB flush is necessary at all.
>  - Why it is desirable to do the TLB flush outside of the MMU lock.
>  - Why it is safe to do the TLB flush outside of the MMU lock.
> 
> No functional change intended.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>

One nit below,

Reviewed-by: Sean Christopherson <seanjc@google.com>

> ---
>  arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1d275e9d76b5..8ed2b42a7aa3 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5756,6 +5756,7 @@ static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>  				continue;
>  
>  			flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
> +
>  							PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
>  							start, end - 1, true, flush);
>  		}
> @@ -5825,15 +5826,27 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>  	}
>  
>  	/*
> -	 * We can flush all the TLBs out of the mmu lock without TLB
> -	 * corruption since we just change the spte from writable to
> -	 * readonly so that we only need to care the case of changing
> -	 * spte from present to present (changing the spte from present
> -	 * to nonpresent will flush all the TLBs immediately), in other
> -	 * words, the only case we care is mmu_spte_update() where we
> -	 * have checked Host-writable | MMU-writable instead of
> -	 * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
> -	 * anymore.
> +	 * Flush TLBs if any SPTEs had to be write-protected to ensure that
> +	 * guest writes are reflected in the dirty bitmap before the memslot
> +	 * update completes, i.e. before enabling dirty logging is visible to
> +	 * userspace.
> +	 *
> +	 * Perform the TLB flush outside the mmu_lock to reduce the amount of
> +	 * time the lock is held. However, this does mean that another CPU can
> +	 * now grab the mmu_lock and encounter an SPTE that is write-protected
> +	 * while CPUs still have writable versions of that SPTE in their TLB.

Uber nit on "SPTE in their TLB".  Maybe this?

	 * now grab mmu_lock and encounter a write-protected SPTE while CPUs
	 * still have a writable mapping for the associated GFN in their TLB.

> +	 *
> +	 * This is safe but requires KVM to be careful when making decisions
> +	 * based on the write-protection status of an SPTE. Specifically, KVM
> +	 * also write-protects SPTEs to monitor changes to guest page tables
> +	 * during shadow paging, and must guarantee no CPUs can write to those
> +	 * page before the lock is dropped. As mentioned in the previous
> +	 * paragraph, a write-protected SPTE is no guarantee that CPU cannot
> +	 * perform writes. So to determine if a TLB flush is truly required, KVM
> +	 * will clear a separate software-only bit (MMU-writable) and skip the
> +	 * flush if-and-only-if this bit was already clear.
> +	 *
> +	 * See DEFAULT_SPTE_MMU_WRITEABLE for more details.
>  	 */
>  	if (flush)
>  		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> -- 
> 2.34.1.703.g22d0c6ccf7-goog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU
  2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
                   ` (3 preceding siblings ...)
  2022-01-13 23:30 ` [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() David Matlack
@ 2022-01-17 17:59 ` Paolo Bonzini
  2022-01-18 17:38   ` David Matlack
  4 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-01-17 17:59 UTC (permalink / raw)
  To: David Matlack
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm

On 1/14/22 00:30, David Matlack wrote:
> While attempting to understand the big comment in
> kvm_mmu_slot_remove_write_access() about TLB flushing, I discovered a
> bug in the way the TDP MMU write-protects GFNs. I have not managed to
> reproduce the bug as it requires a rather complex set up of live
> migrating a VM that is using nested virtualization while the TDP MMU is
> enabled.
> 
> Patch 1 fixes the bug and is CC'd to stable.
> Patch 2-3 fix, document, and enforce invariants around MMU-writable
> and Host-writable bits.
> Patch 4 fixes up the aformentioned comment to be more readable.
> 
> Tested using the kvm-unit-tests and KVM selftests.
> 
> v2:
>   - Skip setting the SPTE when MMU-writable is already clear [Sean]
>   - Add patches for {MMU,Host}-writable invariants [Sean]
>   - Fix inaccuracies in kvm_mmu_slot_remove_write_access() comment [Sean]
> 
> v1: https://lore.kernel.org/kvm/20220112215801.3502286-1-dmatlack@google.com/
> 
> David Matlack (4):
>    KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
>    KVM: x86/mmu: Clear MMU-writable during changed_pte notifier
>    KVM: x86/mmu: Document and enforce MMU-writable and Host-writable
>      invariants
>    KVM: x86/mmu: Improve TLB flush comment in
>      kvm_mmu_slot_remove_write_access()
> 
>   arch/x86/kvm/mmu/mmu.c     | 31 ++++++++++++++++++++--------
>   arch/x86/kvm/mmu/spte.c    |  1 +
>   arch/x86/kvm/mmu/spte.h    | 42 ++++++++++++++++++++++++++++++++------
>   arch/x86/kvm/mmu/tdp_mmu.c |  6 +++---
>   4 files changed, 62 insertions(+), 18 deletions(-)
> 
> 
> base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU
  2022-01-17 17:59 ` [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU Paolo Bonzini
@ 2022-01-18 17:38   ` David Matlack
  2022-01-18 17:41     ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: David Matlack @ 2022-01-18 17:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list

On Mon, Jan 17, 2022 at 9:59 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 1/14/22 00:30, David Matlack wrote:
> > While attempting to understand the big comment in
> > kvm_mmu_slot_remove_write_access() about TLB flushing, I discovered a
> > bug in the way the TDP MMU write-protects GFNs. I have not managed to
> > reproduce the bug as it requires a rather complex set up of live
> > migrating a VM that is using nested virtualization while the TDP MMU is
> > enabled.
> >
> > Patch 1 fixes the bug and is CC'd to stable.
> > Patch 2-3 fix, document, and enforce invariants around MMU-writable
> > and Host-writable bits.
> > Patch 4 fixes up the aformentioned comment to be more readable.
> >
> > Tested using the kvm-unit-tests and KVM selftests.
> >
> > v2:
> >   - Skip setting the SPTE when MMU-writable is already clear [Sean]
> >   - Add patches for {MMU,Host}-writable invariants [Sean]
> >   - Fix inaccuracies in kvm_mmu_slot_remove_write_access() comment [Sean]
> >
> > v1: https://lore.kernel.org/kvm/20220112215801.3502286-1-dmatlack@google.com/
> >
> > David Matlack (4):
> >    KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
> >    KVM: x86/mmu: Clear MMU-writable during changed_pte notifier
> >    KVM: x86/mmu: Document and enforce MMU-writable and Host-writable
> >      invariants
> >    KVM: x86/mmu: Improve TLB flush comment in
> >      kvm_mmu_slot_remove_write_access()
> >
> >   arch/x86/kvm/mmu/mmu.c     | 31 ++++++++++++++++++++--------
> >   arch/x86/kvm/mmu/spte.c    |  1 +
> >   arch/x86/kvm/mmu/spte.h    | 42 ++++++++++++++++++++++++++++++++------
> >   arch/x86/kvm/mmu/tdp_mmu.c |  6 +++---
> >   4 files changed, 62 insertions(+), 18 deletions(-)
> >
> >
> > base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
>
> Queued, thanks.

Thanks Paolo.

Patches 1 and 4 had some wordsmithing suggestions from Sean that I
think would be worth taking. I'm fine if you want to fold his
suggestions directly into the queued patches or I can resend.

The feedback on Patch 3 would require a follow-up series to address,
which I can handle separately.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU
  2022-01-18 17:38   ` David Matlack
@ 2022-01-18 17:41     ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-01-18 17:41 UTC (permalink / raw)
  To: David Matlack
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list

On 1/18/22 18:38, David Matlack wrote:
> Thanks Paolo.
> 
> Patches 1 and 4 had some wordsmithing suggestions from Sean that I
> think would be worth taking. I'm fine if you want to fold his
> suggestions directly into the queued patches or I can resend.
> 
> The feedback on Patch 3 would require a follow-up series to address,
> which I can handle separately.
> 

Ok, I will fold it myself.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants
  2022-01-14 22:29   ` Sean Christopherson
@ 2022-01-18 17:45     ` David Matlack
  0 siblings, 0 replies; 13+ messages in thread
From: David Matlack @ 2022-01-18 17:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list

On Fri, Jan 14, 2022 at 2:29 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jan 13, 2022, David Matlack wrote:
> > +/*
> > + * *_SPTE_HOST_WRITEABLE (aka Host-writable) indicates whether the host permits
> > + * writes to the guest page mapped by the SPTE. This bit is cleared on SPTEs
> > + * that map guest pages in read-only memslots and read-only VMAs.
> > + *
> > + * Invariants:
> > + *  - If Host-writable is clear, PT_WRITABLE_MASK must be clear.
> > + *
> > + *
> > + * *_SPTE_MMU_WRITEABLE (aka MMU-writable) indicates whether the shadow MMU
> > + * allows writes to the guest page mapped by the SPTE. This bit is cleared when
> > + * the guest page mapped by the SPTE contains a page table that is being
> > + * monitored for shadow paging. In this case the SPTE can only be made writable
> > + * by unsyncing the shadow page under the mmu_lock.
> > + *
> > + * Invariants:
> > + *  - If MMU-writable is clear, PT_WRITABLE_MASK must be clear.
> > + *  - If MMU-writable is set, Host-writable must be set.
> > + *
> > + * If MMU-writable is set, PT_WRITABLE_MASK is normally set but can be cleared
> > + * to track writes for dirty logging. For such SPTEs, KVM will locklessly set
> > + * PT_WRITABLE_MASK upon the next write from the guest and record the write in
> > + * the dirty log (see fast_page_fault()).
> > + */
> > +
> > +/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
> > +#define DEFAULT_SPTE_HOST_WRITEABLE  BIT_ULL(9)
> > +#define DEFAULT_SPTE_MMU_WRITEABLE   BIT_ULL(10)
>
> Ha, so there's a massive comment above is_writable_pte() that covers a lot of
> the same material.  More below.
>
> > +
> >  /*
> >   * Low ignored bits are at a premium for EPT, use high ignored bits, taking care
> >   * to not overlap the A/D type mask or the saved access bits of access-tracked
> > @@ -316,8 +341,13 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
> >
> >  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
> >  {
> > -     return (spte & shadow_host_writable_mask) &&
> > -            (spte & shadow_mmu_writable_mask);
> > +     if (spte & shadow_mmu_writable_mask) {
> > +             WARN_ON_ONCE(!(spte & shadow_host_writable_mask));
> > +             return true;
> > +     }
> > +
> > +     WARN_ON_ONCE(spte & PT_WRITABLE_MASK);
>
> I don't like having the WARNs here.  This is a moderately hot path, there are a
> decent number of call sites, and the WARNs won't actually help detect the offender,
> i.e. whoever wrote the bad SPTE long since got away.

Re: hot path. The "return true" case (for fast_page_fault()) already
had to do 2 bitwise-ANDs and compares, so this patch shouldn't make
that any worse.

But that's a good point that it doesn't help with detecting the
offender. I agree these WARNs should move to where SPTEs are set.

>
> And for whatever reason, I had a hell of a time (correctly) reading the second WARN :-)
>
> Lastly, there's also an "overlapping" WARN in mark_spte_for_access_track().
>
> > +     return false;
>
> To kill a few birds with fewer stones, what if we:
>
>   a. Move is_writable_pte() into spte.h, somewhat close to the HOST/MMU_WRITABLE
>      definitions.
>
>   b. Add a new helper, spte_check_writable_invariants(), to enforce that a SPTE
>      is WRITABLE iff it's MMU-Writable, and that a SPTE is MMU-Writable iff it's
>      HOST-Writable.
>
>   c. Drop the WARN in mark_spte_for_access_track().
>
>   d. Call spte_check_writable_invariants() when setting SPTEs.
>
>   e. Document everything in a comment above spte_check_writable_invariants().

Sounds good. I'll send a follow-up series.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-18 17:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 23:30 [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
2022-01-13 23:30 ` [PATCH v2 1/4] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
2022-01-14 23:38   ` Sean Christopherson
2022-01-13 23:30 ` [PATCH v2 2/4] KVM: x86/mmu: Clear MMU-writable during changed_pte notifier David Matlack
2022-01-14 23:41   ` Sean Christopherson
2022-01-13 23:30 ` [PATCH v2 3/4] KVM: x86/mmu: Document and enforce MMU-writable and Host-writable invariants David Matlack
2022-01-14 22:29   ` Sean Christopherson
2022-01-18 17:45     ` David Matlack
2022-01-13 23:30 ` [PATCH v2 4/4] KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() David Matlack
2022-01-14 23:58   ` Sean Christopherson
2022-01-17 17:59 ` [PATCH v2 0/4] KVM: x86/mmu: Fix write-protection bug in the TDP MMU Paolo Bonzini
2022-01-18 17:38   ` David Matlack
2022-01-18 17:41     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).