kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: x86/mmu: Fix write-protection bug in the TDP MMU
@ 2022-01-12 21:57 David Matlack
  2022-01-12 21:58 ` [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
  2022-01-12 21:58 ` [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection David Matlack
  0 siblings, 2 replies; 12+ messages in thread
From: David Matlack @ 2022-01-12 21:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

While attempting to understand the big comment in
kvm_mmu_slot_remove_write_access() about TLB flushing, I discovered a
bug in the way the TDP MMU write-protects GFNs. I have not managed to
reproduce the bug as it requires a rather complex set up of live
migrating a VM that is using nested virtualization while the TDP MMU is
enabled.

Patch 1 fixes the bug and patch 2 fixes up the afformentioned comment to
be more readable.

Tested using the kvm-unit-tests and KVM selftests.

David Matlack (2):
  KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  KVM: x86/mmu: Improve comment about TLB flush semantics for
    write-protection

 arch/x86/kvm/mmu/mmu.c     | 29 ++++++++++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c | 27 ++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 16 deletions(-)


base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-12 21:57 [PATCH 0/2] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
@ 2022-01-12 21:58 ` David Matlack
  2022-01-12 23:14   ` Sean Christopherson
  2022-01-12 21:58 ` [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection David Matlack
  1 sibling, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-12 21:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack, stable

When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.

This behavior is incorrect because the SPTE may be write-protected for
dirty logging. This implies that the SPTE could be locklessly be made
writable on the next write access, and that vCPUs could still be running
with writable SPTEs cached in their TLB.

Fix this by unconditionally setting the SPTE and only skipping the TLB
flush if the SPTE was already marked !MMU-writable or !Host-writable,
which guarantees the SPTE cannot be locklessly be made writable and no
vCPUs are running the writable SPTEs cached in their TLBs.

Technically it would be safe to skip setting the SPTE as well since:

  (a) If MMU-writable is set then Host-writable must be cleared
      and the only way to set Host-writable is to fault the SPTE
      back in entirely (at which point any unsynced shadow pages
      reachable by the new SPTE will be synced and MMU-writable can
      be safetly be set again).

  and

  (b) MMU-writable is never consulted on its own.

And in fact this is what the shadow MMU does when write-protecting guest
page tables. However setting the SPTE unconditionally is much easier to
reason about and does not require a huge comment explaining why it is safe.

Fixes: 46044f72c382 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1bc816b7c3..462c6de9f944 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1423,14 +1423,16 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 /*
  * Removes write access on the last level SPTE mapping this GFN and unsets the
  * MMU-writable bit to ensure future writes continue to be intercepted.
- * Returns true if an SPTE was set and a TLB flush is needed.
+ *
+ * Returns true if a TLB flush is needed to ensure no CPU has a writable
+ * version of the SPTE in its TLB.
  */
 static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 			      gfn_t gfn, int min_level)
 {
 	struct tdp_iter iter;
 	u64 new_spte;
-	bool spte_set = false;
+	bool flush = false;
 
 	BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
 
@@ -1442,19 +1444,30 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
 		    !is_last_spte(iter.old_spte, iter.level))
 			continue;
 
-		if (!is_writable_pte(iter.old_spte))
-			break;
-
 		new_spte = iter.old_spte &
 			~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
 
 		tdp_mmu_set_spte(kvm, &iter, new_spte);
-		spte_set = true;
+
+		/*
+		 * The TLB flush can be skipped if the old SPTE cannot be
+		 * locklessly be made writable, which implies it is already
+		 * write-protected due to being !MMU-writable or !Host-writable.
+		 * This guarantees no CPU currently has a writable version of
+		 * this SPTE in its TLB.
+		 *
+		 * Otherwise the old SPTE was either not write-protected or was
+		 * write-protected but for dirty logging (which does not flush
+		 * TLBs before dropping the MMU lock), so a TLB flush is
+		 * required.
+		 */
+		if (spte_can_locklessly_be_made_writable(iter.old_spte))
+			flush = true;
 	}
 
 	rcu_read_unlock();
 
-	return spte_set;
+	return flush;
 }
 
 /*

base-commit: fea31d1690945e6dd6c3e89ec5591490857bc3d4
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection
  2022-01-12 21:57 [PATCH 0/2] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
  2022-01-12 21:58 ` [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
@ 2022-01-12 21:58 ` David Matlack
  2022-01-13  0:46   ` Sean Christopherson
  1 sibling, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-12 21:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, David Matlack

Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
why it is safe to flush TLBs outside of the MMU lock after
write-protecting SPTEs for dirty logging. The current comment is a long
run-on sentance that was difficult to undertsand. In addition it was
specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
MMU has to handle this as well.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1d275e9d76b5..33f550b3be8f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5825,15 +5825,26 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	}
 
 	/*
-	 * We can flush all the TLBs out of the mmu lock without TLB
-	 * corruption since we just change the spte from writable to
-	 * readonly so that we only need to care the case of changing
-	 * spte from present to present (changing the spte from present
-	 * to nonpresent will flush all the TLBs immediately), in other
-	 * words, the only case we care is mmu_spte_update() where we
-	 * have checked Host-writable | MMU-writable instead of
-	 * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
-	 * anymore.
+	 * It is safe to flush TLBs outside of the MMU lock since SPTEs are only
+	 * being changed from writable to read-only (i.e. the mapping to host
+	 * PFNs is not changing). All we care about is that CPUs start using the
+	 * read-only mappings from this point forward to ensure the dirty bitmap
+	 * gets updated, but that does not need to run under the MMU lock.
+	 *
+	 * Note that there are other reasons why SPTEs can be write-protected
+	 * besides dirty logging: (1) to intercept guest page table
+	 * modifications when doing shadow paging and (2) to protecting guest
+	 * memory that is not host-writable. Both of these usecases require
+	 * flushing the TLB under the MMU lock to ensure CPUs are not running
+	 * with writable SPTEs in their TLB. The tricky part is knowing when it
+	 * is safe to skip a TLB flush if an SPTE is already write-protected,
+	 * since it could have been write-protected for dirty-logging which does
+	 * not flush under the lock.
+	 *
+	 * To handle this each SPTE has an MMU-writable bit and a Host-writable
+	 * bit (KVM-specific bits that are not used by hardware). These bits
+	 * allow KVM to deduce *why* a given SPTE is currently write-protected,
+	 * so that it knows when it needs to flush TLBs under the MMU lock.
 	 */
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-12 21:58 ` [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
@ 2022-01-12 23:14   ` Sean Christopherson
  2022-01-12 23:57     ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-01-12 23:14 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm, stable

On Wed, Jan 12, 2022, David Matlack wrote:
> When the TDP MMU is write-protection GFNs for page table protection (as
> opposed to for dirty logging, or due to the HVA not being writable), it
> checks if the SPTE is already write-protected and if so skips modifying
> the SPTE and the TLB flush.
> 
> This behavior is incorrect because the SPTE may be write-protected for
> dirty logging. This implies that the SPTE could be locklessly be made
> writable on the next write access, and that vCPUs could still be running
> with writable SPTEs cached in their TLB.
> 
> Fix this by unconditionally setting the SPTE and only skipping the TLB
> flush if the SPTE was already marked !MMU-writable or !Host-writable,
> which guarantees the SPTE cannot be locklessly be made writable and no
> vCPUs are running the writable SPTEs cached in their TLBs.
> 
> Technically it would be safe to skip setting the SPTE as well since:
> 
>   (a) If MMU-writable is set then Host-writable must be cleared
>       and the only way to set Host-writable is to fault the SPTE
>       back in entirely (at which point any unsynced shadow pages
>       reachable by the new SPTE will be synced and MMU-writable can
>       be safetly be set again).
> 
>   and
> 
>   (b) MMU-writable is never consulted on its own.
> 
> And in fact this is what the shadow MMU does when write-protecting guest
> page tables. However setting the SPTE unconditionally is much easier to
> reason about and does not require a huge comment explaining why it is safe.

I disagree.  I looked at the code+comment before reading the full changelog and
typed up a response saying the code should be:

		if (!is_writable_pte(iter.old_spte) &&
		    !spte_can_locklessly_be_made_writable(spte))
			break;

Then I went read the changelog and here we are :-)

I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
writable and can't be made writable, there's nothing to do".

Versus "unconditionally clear the writable bits because ???, but only flush if
the write was actually necessary", with a slightly opinionated translation :-)

And with that, you don't need to do s/spte_set/flush.  Though I would be in favor
of a separate patch to do s/spte_set/write_protected here and in the caller, to
match kvm_mmu_slot_gfn_write_protect().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-12 23:14   ` Sean Christopherson
@ 2022-01-12 23:57     ` David Matlack
  2022-01-13  0:28       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-12 23:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list, stable

On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jan 12, 2022, David Matlack wrote:
> > When the TDP MMU is write-protection GFNs for page table protection (as
> > opposed to for dirty logging, or due to the HVA not being writable), it
> > checks if the SPTE is already write-protected and if so skips modifying
> > the SPTE and the TLB flush.
> >
> > This behavior is incorrect because the SPTE may be write-protected for
> > dirty logging. This implies that the SPTE could be locklessly be made
> > writable on the next write access, and that vCPUs could still be running
> > with writable SPTEs cached in their TLB.
> >
> > Fix this by unconditionally setting the SPTE and only skipping the TLB
> > flush if the SPTE was already marked !MMU-writable or !Host-writable,
> > which guarantees the SPTE cannot be locklessly be made writable and no
> > vCPUs are running the writable SPTEs cached in their TLBs.
> >
> > Technically it would be safe to skip setting the SPTE as well since:
> >
> >   (a) If MMU-writable is set then Host-writable must be cleared
> >       and the only way to set Host-writable is to fault the SPTE
> >       back in entirely (at which point any unsynced shadow pages
> >       reachable by the new SPTE will be synced and MMU-writable can
> >       be safetly be set again).
> >
> >   and
> >
> >   (b) MMU-writable is never consulted on its own.
> >
> > And in fact this is what the shadow MMU does when write-protecting guest
> > page tables. However setting the SPTE unconditionally is much easier to
> > reason about and does not require a huge comment explaining why it is safe.
>
> I disagree.  I looked at the code+comment before reading the full changelog and
> typed up a response saying the code should be:
>
>                 if (!is_writable_pte(iter.old_spte) &&
>                     !spte_can_locklessly_be_made_writable(spte))
>                         break;
>
> Then I went read the changelog and here we are :-)
>
> I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
> writable and can't be made writable, there's nothing to do".

Oh interesting. I actually find that confusing because it can easily
lead to the MMU-writable bit staying set. Here we are protecting GFNs
and we're opting to leave the MMU-writable bit set. It takes a lot of
digging to figure out that this is safe because if MMU-writable is set
and the SPTE cannot be locklessly be made writable then it implies
Host-writable is clear, and Host-writable can't be reset without
syncing the all shadow pages reachable by the MMU. Oh and the
MMU-writable bit is never consulted on its own (e.g. We never iterate
through all SPTEs to find the ones that are !MMU-writable).

Maybe my understanding is horribly off since this all seems
unnecessarily convoluted, and the cost of always clearing MMU-writable
is just an extra bitwise-OR.

The TLB flush is certainly unnecessary if the SPTE is already
!Host-writable, which is what this commit does.

>
> Versus "unconditionally clear the writable bits because ???, but only flush if
> the write was actually necessary", with a slightly opinionated translation :-)

If MMU-writable is already clear we can definitely break. I had that
in a previous version of the patch by checking if iter.old_spte ==
new_spte but it seemed unnecessary since the guts of
tdp_mmu_spte_set() already optimizes for this.

>
> And with that, you don't need to do s/spte_set/flush.  Though I would be in favor
> of a separate patch to do s/spte_set/write_protected here and in the caller, to
> match kvm_mmu_slot_gfn_write_protect().

I'm not sure write_protected would not be a good variable name because
even if we did not write-protect the SPTE (i.e. PT_WRITABLE_MASK was
already clear) we may still need a TLB flush to ensure no CPUs have a
writable SPTE in their TLB. Perhaps we have different definitions for
"write-protecting"?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-12 23:57     ` David Matlack
@ 2022-01-13  0:28       ` Sean Christopherson
  2022-01-13 17:04         ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-01-13  0:28 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list, stable

On Wed, Jan 12, 2022, David Matlack wrote:
> On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Jan 12, 2022, David Matlack wrote:
> > > When the TDP MMU is write-protection GFNs for page table protection (as
> > > opposed to for dirty logging, or due to the HVA not being writable), it
> > > checks if the SPTE is already write-protected and if so skips modifying
> > > the SPTE and the TLB flush.
> > >
> > > This behavior is incorrect because the SPTE may be write-protected for
> > > dirty logging. This implies that the SPTE could be locklessly be made
> > > writable on the next write access, and that vCPUs could still be running
> > > with writable SPTEs cached in their TLB.
> > >
> > > Fix this by unconditionally setting the SPTE and only skipping the TLB
> > > flush if the SPTE was already marked !MMU-writable or !Host-writable,
> > > which guarantees the SPTE cannot be locklessly be made writable and no
> > > vCPUs are running the writable SPTEs cached in their TLBs.
> > >
> > > Technically it would be safe to skip setting the SPTE as well since:
> > >
> > >   (a) If MMU-writable is set then Host-writable must be cleared
> > >       and the only way to set Host-writable is to fault the SPTE
> > >       back in entirely (at which point any unsynced shadow pages
> > >       reachable by the new SPTE will be synced and MMU-writable can
> > >       be safetly be set again).
> > >
> > >   and
> > >
> > >   (b) MMU-writable is never consulted on its own.
> > >
> > > And in fact this is what the shadow MMU does when write-protecting guest
> > > page tables. However setting the SPTE unconditionally is much easier to
> > > reason about and does not require a huge comment explaining why it is safe.
> >
> > I disagree.  I looked at the code+comment before reading the full changelog and
> > typed up a response saying the code should be:
> >
> >                 if (!is_writable_pte(iter.old_spte) &&
> >                     !spte_can_locklessly_be_made_writable(spte))
> >                         break;
> >
> > Then I went read the changelog and here we are :-)
> >
> > I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
> > writable and can't be made writable, there's nothing to do".
> 
> Oh interesting. I actually find that confusing because it can easily
> lead to the MMU-writable bit staying set. Here we are protecting GFNs
> and we're opting to leave the MMU-writable bit set. It takes a lot of
> digging to figure out that this is safe because if MMU-writable is set
> and the SPTE cannot be locklessly be made writable then it implies
> Host-writable is clear, and Host-writable can't be reset without
> syncing the all shadow pages reachable by the MMU. Oh and the
> MMU-writable bit is never consulted on its own (e.g. We never iterate
> through all SPTEs to find the ones that are !MMU-writable).

Ah, you've missed the other wrinkle: MMU-writable can bet set iff Host-writable
is set.  In other words, the MMU-writable bit is never left set because it can't
be set if spte_can_locklessly_be_made_writable() returns false.

To reduce confusion, we can and probably should do:

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index a4af2a42695c..bc691ff72cab 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -316,8 +316,7 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,

 static inline bool spte_can_locklessly_be_made_writable(u64 spte)
 {
-       return (spte & shadow_host_writable_mask) &&
-              (spte & shadow_mmu_writable_mask);
+       return (spte & shadow_mmu_writable_mask);
 }

 static inline u64 get_mmio_spte_generation(u64 spte)

Though it'd be nice to have a WARN somewhere to enforce that MMU-Writable isn't
set without Host-writable.

We could also rename the helper to is_mmu_writable_spte(), though I'm not sure
that's actually better.

Yet another option would be to invert the flag and make it shadow_mmu_pt_protected_mask
or something, i.e. make it more explicitly a flag that says "this thing is write-protected
for shadowing a page table".

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection
  2022-01-12 21:58 ` [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection David Matlack
@ 2022-01-13  0:46   ` Sean Christopherson
  2022-01-13 17:10     ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-01-13  0:46 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm

On Wed, Jan 12, 2022, David Matlack wrote:
> Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
> why it is safe to flush TLBs outside of the MMU lock after
> write-protecting SPTEs for dirty logging. The current comment is a long
> run-on sentance that was difficult to undertsand. In addition it was
> specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
> MMU has to handle this as well.
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 1d275e9d76b5..33f550b3be8f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5825,15 +5825,26 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>  	}
>  
>  	/*
> -	 * We can flush all the TLBs out of the mmu lock without TLB
> -	 * corruption since we just change the spte from writable to
> -	 * readonly so that we only need to care the case of changing
> -	 * spte from present to present (changing the spte from present
> -	 * to nonpresent will flush all the TLBs immediately), in other
> -	 * words, the only case we care is mmu_spte_update() where we
> -	 * have checked Host-writable | MMU-writable instead of
> -	 * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
> -	 * anymore.
> +	 * It is safe to flush TLBs outside of the MMU lock since SPTEs are only
> +	 * being changed from writable to read-only (i.e. the mapping to host
> +	 * PFNs is not changing). 

Hmm, you mostly address things in the next sentence/paragraph, but it's more than
the SPTE being downgraded from writable => read-only, e.g. if the SPTE were being
made read-only due to userspace removing permissions, then KVM would need to flush
before dropping mmu_lock.  The qualifier about the PFN not changing actually does
more harm than good because it further suggests that writable => read-only is
somehow inherently safe.  

> +      * All we care about is that CPUs start using the
> +	 * read-only mappings from this point forward to ensure the dirty bitmap
> +	 * gets updated, but that does not need to run under the MMU lock.

"this point forward" isn't technically true, the requirement is that the flush
occurs before the memslot update completes.  Definitely splitting hairs, I mean,
this basically is the end of the memslot update, but it's an opportunity to
clarify _why_ the flush needs to happen at this point.

> +	 *
> +	 * Note that there are other reasons why SPTEs can be write-protected
> +	 * besides dirty logging: (1) to intercept guest page table
> +	 * modifications when doing shadow paging and (2) to protecting guest
> +	 * memory that is not host-writable.

So, technically, #2 is not possible.  KVM doesn't allow a memslot to be converted
from writable => read-only, userspace must first delete the entire memslot.  That
means the relevant SPTEs never transition directly from writable to !writable,
they are instead zapped entirely and "new" SPTEs are created that are read-only
from their genesis.

Making a VMA read-only also results in SPTEs being zapped and recreated, though
this is an area for improvement.  We could cover future changes in this area by
being a bit fuzzy in the wording, but I think it would be better to talk only
about the shadow paging case and thus only about MMU-writable, because Host-writable
is (currently) immutable and making it mutable (in the mmu_notifier path) will
have additional "rule" changes.

> + 	 * Both of these usecases require
> +	 * flushing the TLB under the MMU lock to ensure CPUs are not running
> +	 * with writable SPTEs in their TLB. The tricky part is knowing when it
> +	 * is safe to skip a TLB flush if an SPTE is already write-protected,
> +	 * since it could have been write-protected for dirty-logging which does
> +	 * not flush under the lock.

It's a bit unclear that the last "skip a TLB flush" snippet is referring to a
future TLB flush, not this TLB flush.

> +	 *
> +	 * To handle this each SPTE has an MMU-writable bit and a Host-writable
> +	 * bit (KVM-specific bits that are not used by hardware). These bits
> +	 * allow KVM to deduce *why* a given SPTE is currently write-protected,
> +	 * so that it knows when it needs to flush TLBs under the MMU lock.

I much rather we add this type of comment over the definitions for
DEFAULT_SPTE_{HOST,MMU}_WRITEABLE and then provide a reference to those definitions
after a very brief "KVM uses the MMU-writable bit".

So something like this?  Plus more commentry in spte.h.

	/*
	 * It's safe to flush TLBs after dropping mmu_lock as making a writable
	 * SPTE read-only for dirty logging only needs to ensure KVM starts
	 * logging writes to the memslot before the memslot update completes,
	 * i.e. before the enabling of dirty logging is visible to userspace.
	 *
	 * Note, KVM also write-protects SPTEs when shadowing guest page tables,
	 * in which case a TLB flush is needed before dropping mmu_lock().  To
	 * ensure a future TLB flush isn't missed, KVM uses a software-available
	 * bit to track if a SPTE is MMU-Writable, i.e. is considered writable
	 * for shadow paging purposes.  When write-protecting for shadow paging,
	 * KVM clears both WRITABLE and MMU-Writable, and performs a TLB flush
	 * while holding mmu_lock if either bit is cleared.
	 *
	 * See DEFAULT_SPTE_{HOST,MMU}_WRITEABLE for more details.
	 */

>  	 */
>  	if (flush)
>  		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> -- 
> 2.34.1.703.g22d0c6ccf7-goog
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-13  0:28       ` Sean Christopherson
@ 2022-01-13 17:04         ` David Matlack
  2022-01-13 18:28           ` David Matlack
  0 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-13 17:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list, stable

On Wed, Jan 12, 2022 at 4:29 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jan 12, 2022, David Matlack wrote:
> > On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Wed, Jan 12, 2022, David Matlack wrote:
> > > > When the TDP MMU is write-protection GFNs for page table protection (as
> > > > opposed to for dirty logging, or due to the HVA not being writable), it
> > > > checks if the SPTE is already write-protected and if so skips modifying
> > > > the SPTE and the TLB flush.
> > > >
> > > > This behavior is incorrect because the SPTE may be write-protected for
> > > > dirty logging. This implies that the SPTE could be locklessly be made
> > > > writable on the next write access, and that vCPUs could still be running
> > > > with writable SPTEs cached in their TLB.
> > > >
> > > > Fix this by unconditionally setting the SPTE and only skipping the TLB
> > > > flush if the SPTE was already marked !MMU-writable or !Host-writable,
> > > > which guarantees the SPTE cannot be locklessly be made writable and no
> > > > vCPUs are running the writable SPTEs cached in their TLBs.
> > > >
> > > > Technically it would be safe to skip setting the SPTE as well since:
> > > >
> > > >   (a) If MMU-writable is set then Host-writable must be cleared
> > > >       and the only way to set Host-writable is to fault the SPTE
> > > >       back in entirely (at which point any unsynced shadow pages
> > > >       reachable by the new SPTE will be synced and MMU-writable can
> > > >       be safetly be set again).
> > > >
> > > >   and
> > > >
> > > >   (b) MMU-writable is never consulted on its own.
> > > >
> > > > And in fact this is what the shadow MMU does when write-protecting guest
> > > > page tables. However setting the SPTE unconditionally is much easier to
> > > > reason about and does not require a huge comment explaining why it is safe.
> > >
> > > I disagree.  I looked at the code+comment before reading the full changelog and
> > > typed up a response saying the code should be:
> > >
> > >                 if (!is_writable_pte(iter.old_spte) &&
> > >                     !spte_can_locklessly_be_made_writable(spte))
> > >                         break;
> > >
> > > Then I went read the changelog and here we are :-)
> > >
> > > I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
> > > writable and can't be made writable, there's nothing to do".
> >
> > Oh interesting. I actually find that confusing because it can easily
> > lead to the MMU-writable bit staying set. Here we are protecting GFNs
> > and we're opting to leave the MMU-writable bit set. It takes a lot of
> > digging to figure out that this is safe because if MMU-writable is set
> > and the SPTE cannot be locklessly be made writable then it implies
> > Host-writable is clear, and Host-writable can't be reset without
> > syncing the all shadow pages reachable by the MMU. Oh and the
> > MMU-writable bit is never consulted on its own (e.g. We never iterate
> > through all SPTEs to find the ones that are !MMU-writable).
>
> Ah, you've missed the other wrinkle: MMU-writable can bet set iff Host-writable
> is set.  In other words, the MMU-writable bit is never left set because it can't
> be set if spte_can_locklessly_be_made_writable() returns false.

Ohhh I did miss that and yes that explains it. I'll send another
version of this patch that skips setting the SPTE unnecessarily.

>
> To reduce confusion, we can and probably should do:
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index a4af2a42695c..bc691ff72cab 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -316,8 +316,7 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
>
>  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
>  {
> -       return (spte & shadow_host_writable_mask) &&
> -              (spte & shadow_mmu_writable_mask);
> +       return (spte & shadow_mmu_writable_mask);
>  }
>
>  static inline u64 get_mmio_spte_generation(u64 spte)
>
> Though it'd be nice to have a WARN somewhere to enforce that MMU-Writable isn't
> set without Host-writable.
>
> We could also rename the helper to is_mmu_writable_spte(), though I'm not sure
> that's actually better.
>
> Yet another option would be to invert the flag and make it shadow_mmu_pt_protected_mask
> or something, i.e. make it more explicitly a flag that says "this thing is write-protected
> for shadowing a page table".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection
  2022-01-13  0:46   ` Sean Christopherson
@ 2022-01-13 17:10     ` David Matlack
  2022-01-13 22:40       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-13 17:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list

On Wed, Jan 12, 2022 at 4:46 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Jan 12, 2022, David Matlack wrote:
> > Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
> > why it is safe to flush TLBs outside of the MMU lock after
> > write-protecting SPTEs for dirty logging. The current comment is a long
> > run-on sentance that was difficult to undertsand. In addition it was
> > specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
> > MMU has to handle this as well.
> >
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 29 ++++++++++++++++++++---------
> >  1 file changed, 20 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 1d275e9d76b5..33f550b3be8f 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5825,15 +5825,26 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> >       }
> >
> >       /*
> > -      * We can flush all the TLBs out of the mmu lock without TLB
> > -      * corruption since we just change the spte from writable to
> > -      * readonly so that we only need to care the case of changing
> > -      * spte from present to present (changing the spte from present
> > -      * to nonpresent will flush all the TLBs immediately), in other
> > -      * words, the only case we care is mmu_spte_update() where we
> > -      * have checked Host-writable | MMU-writable instead of
> > -      * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK
> > -      * anymore.
> > +      * It is safe to flush TLBs outside of the MMU lock since SPTEs are only
> > +      * being changed from writable to read-only (i.e. the mapping to host
> > +      * PFNs is not changing).
>
> Hmm, you mostly address things in the next sentence/paragraph, but it's more than
> the SPTE being downgraded from writable => read-only, e.g. if the SPTE were being
> made read-only due to userspace removing permissions, then KVM would need to flush
> before dropping mmu_lock.  The qualifier about the PFN not changing actually does
> more harm than good because it further suggests that writable => read-only is
> somehow inherently safe.
>
> > +      * All we care about is that CPUs start using the
> > +      * read-only mappings from this point forward to ensure the dirty bitmap
> > +      * gets updated, but that does not need to run under the MMU lock.
>
> "this point forward" isn't technically true, the requirement is that the flush
> occurs before the memslot update completes.  Definitely splitting hairs, I mean,
> this basically is the end of the memslot update, but it's an opportunity to
> clarify _why_ the flush needs to happen at this point.
>
> > +      *
> > +      * Note that there are other reasons why SPTEs can be write-protected
> > +      * besides dirty logging: (1) to intercept guest page table
> > +      * modifications when doing shadow paging and (2) to protecting guest
> > +      * memory that is not host-writable.
>
> So, technically, #2 is not possible.  KVM doesn't allow a memslot to be converted
> from writable => read-only, userspace must first delete the entire memslot.  That
> means the relevant SPTEs never transition directly from writable to !writable,
> they are instead zapped entirely and "new" SPTEs are created that are read-only
> from their genesis.
>
> Making a VMA read-only also results in SPTEs being zapped and recreated, though
> this is an area for improvement.  We could cover future changes in this area by
> being a bit fuzzy in the wording, but I think it would be better to talk only
> about the shadow paging case and thus only about MMU-writable, because Host-writable
> is (currently) immutable and making it mutable (in the mmu_notifier path) will
> have additional "rule" changes.
>
> > +      * Both of these usecases require
> > +      * flushing the TLB under the MMU lock to ensure CPUs are not running
> > +      * with writable SPTEs in their TLB. The tricky part is knowing when it
> > +      * is safe to skip a TLB flush if an SPTE is already write-protected,
> > +      * since it could have been write-protected for dirty-logging which does
> > +      * not flush under the lock.
>
> It's a bit unclear that the last "skip a TLB flush" snippet is referring to a
> future TLB flush, not this TLB flush.
>
> > +      *
> > +      * To handle this each SPTE has an MMU-writable bit and a Host-writable
> > +      * bit (KVM-specific bits that are not used by hardware). These bits
> > +      * allow KVM to deduce *why* a given SPTE is currently write-protected,
> > +      * so that it knows when it needs to flush TLBs under the MMU lock.
>
> I much rather we add this type of comment over the definitions for
> DEFAULT_SPTE_{HOST,MMU}_WRITEABLE and then provide a reference to those definitions
> after a very brief "KVM uses the MMU-writable bit".
>
> So something like this?  Plus more commentry in spte.h.
>
>         /*
>          * It's safe to flush TLBs after dropping mmu_lock as making a writable
>          * SPTE read-only for dirty logging only needs to ensure KVM starts
>          * logging writes to the memslot before the memslot update completes,
>          * i.e. before the enabling of dirty logging is visible to userspace.
>          *
>          * Note, KVM also write-protects SPTEs when shadowing guest page tables,
>          * in which case a TLB flush is needed before dropping mmu_lock().  To
>          * ensure a future TLB flush isn't missed, KVM uses a software-available
>          * bit to track if a SPTE is MMU-Writable, i.e. is considered writable
>          * for shadow paging purposes.  When write-protecting for shadow paging,
>          * KVM clears both WRITABLE and MMU-Writable, and performs a TLB flush
>          * while holding mmu_lock if either bit is cleared.
>          *
>          * See DEFAULT_SPTE_{HOST,MMU}_WRITEABLE for more details.
>          */

Makes sense. I'll rework the comment per your feedback and also
document the {host,mmu}-writable bits. Although I think it'd make more
sense to put those comments on shadow_{host,mmu}_writable_mask as
those are the symbols used throughout the code and EPT uses different
bits than DEFAULT_..._WRITABLE.

>
> >        */
> >       if (flush)
> >               kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> > --
> > 2.34.1.703.g22d0c6ccf7-goog
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-13 17:04         ` David Matlack
@ 2022-01-13 18:28           ` David Matlack
  2022-01-13 19:29             ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: David Matlack @ 2022-01-13 18:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list, stable

On Thu, Jan 13, 2022 at 9:04 AM David Matlack <dmatlack@google.com> wrote:
>
> On Wed, Jan 12, 2022 at 4:29 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Jan 12, 2022, David Matlack wrote:
> > > On Wed, Jan 12, 2022 at 3:14 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Jan 12, 2022, David Matlack wrote:
> > > > > When the TDP MMU is write-protection GFNs for page table protection (as
> > > > > opposed to for dirty logging, or due to the HVA not being writable), it
> > > > > checks if the SPTE is already write-protected and if so skips modifying
> > > > > the SPTE and the TLB flush.
> > > > >
> > > > > This behavior is incorrect because the SPTE may be write-protected for
> > > > > dirty logging. This implies that the SPTE could be locklessly be made
> > > > > writable on the next write access, and that vCPUs could still be running
> > > > > with writable SPTEs cached in their TLB.
> > > > >
> > > > > Fix this by unconditionally setting the SPTE and only skipping the TLB
> > > > > flush if the SPTE was already marked !MMU-writable or !Host-writable,
> > > > > which guarantees the SPTE cannot be locklessly be made writable and no
> > > > > vCPUs are running the writable SPTEs cached in their TLBs.
> > > > >
> > > > > Technically it would be safe to skip setting the SPTE as well since:
> > > > >
> > > > >   (a) If MMU-writable is set then Host-writable must be cleared
> > > > >       and the only way to set Host-writable is to fault the SPTE
> > > > >       back in entirely (at which point any unsynced shadow pages
> > > > >       reachable by the new SPTE will be synced and MMU-writable can
> > > > >       be safetly be set again).
> > > > >
> > > > >   and
> > > > >
> > > > >   (b) MMU-writable is never consulted on its own.
> > > > >
> > > > > And in fact this is what the shadow MMU does when write-protecting guest
> > > > > page tables. However setting the SPTE unconditionally is much easier to
> > > > > reason about and does not require a huge comment explaining why it is safe.
> > > >
> > > > I disagree.  I looked at the code+comment before reading the full changelog and
> > > > typed up a response saying the code should be:
> > > >
> > > >                 if (!is_writable_pte(iter.old_spte) &&
> > > >                     !spte_can_locklessly_be_made_writable(spte))
> > > >                         break;
> > > >
> > > > Then I went read the changelog and here we are :-)
> > > >
> > > > I find that much more easier to grok, e.g. in plain English: "if the SPTE isn't
> > > > writable and can't be made writable, there's nothing to do".
> > >
> > > Oh interesting. I actually find that confusing because it can easily
> > > lead to the MMU-writable bit staying set. Here we are protecting GFNs
> > > and we're opting to leave the MMU-writable bit set. It takes a lot of
> > > digging to figure out that this is safe because if MMU-writable is set
> > > and the SPTE cannot be locklessly be made writable then it implies
> > > Host-writable is clear, and Host-writable can't be reset without
> > > syncing the all shadow pages reachable by the MMU. Oh and the
> > > MMU-writable bit is never consulted on its own (e.g. We never iterate
> > > through all SPTEs to find the ones that are !MMU-writable).
> >
> > Ah, you've missed the other wrinkle: MMU-writable can bet set iff Host-writable
> > is set.  In other words, the MMU-writable bit is never left set because it can't
> > be set if spte_can_locklessly_be_made_writable() returns false.

The changed_pte notifier looks like it clears Host-writable without
clearing MMU-writable. Specifically the call chain:

kvm_mmu_notifier_change_pte()
  kvm_set_spte_gfn()
    kvm_tdp_mmu_set_spte_gfn()
      set_spte_gfn()
        kvm_mmu_changed_pte_notifier_make_spte()

Is there some guarantee that old_spte is !MMU-writable at this point?
If not I could easily change kvm_mmu_changed_pte_notifier_make_spte()
to also clear MMU-writable and preserve the invariant.


>
> Ohhh I did miss that and yes that explains it. I'll send another
> version of this patch that skips setting the SPTE unnecessarily.
>
> >
> > To reduce confusion, we can and probably should do:
> >
> > diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> > index a4af2a42695c..bc691ff72cab 100644
> > --- a/arch/x86/kvm/mmu/spte.h
> > +++ b/arch/x86/kvm/mmu/spte.h
> > @@ -316,8 +316,7 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
> >
> >  static inline bool spte_can_locklessly_be_made_writable(u64 spte)
> >  {
> > -       return (spte & shadow_host_writable_mask) &&
> > -              (spte & shadow_mmu_writable_mask);
> > +       return (spte & shadow_mmu_writable_mask);
> >  }
> >
> >  static inline u64 get_mmio_spte_generation(u64 spte)
> >
> > Though it'd be nice to have a WARN somewhere to enforce that MMU-Writable isn't
> > set without Host-writable.
> >
> > We could also rename the helper to is_mmu_writable_spte(), though I'm not sure
> > that's actually better.
> >
> > Yet another option would be to invert the flag and make it shadow_mmu_pt_protected_mask
> > or something, i.e. make it more explicitly a flag that says "this thing is write-protected
> > for shadowing a page table".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU
  2022-01-13 18:28           ` David Matlack
@ 2022-01-13 19:29             ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-01-13 19:29 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list, stable

On Thu, Jan 13, 2022, David Matlack wrote:
> On Thu, Jan 13, 2022 at 9:04 AM David Matlack <dmatlack@google.com> wrote:
> >
> > On Wed, Jan 12, 2022 at 4:29 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > Oh interesting. I actually find that confusing because it can easily
> > > > lead to the MMU-writable bit staying set. Here we are protecting GFNs
> > > > and we're opting to leave the MMU-writable bit set. It takes a lot of
> > > > digging to figure out that this is safe because if MMU-writable is set
> > > > and the SPTE cannot be locklessly be made writable then it implies
> > > > Host-writable is clear, and Host-writable can't be reset without
> > > > syncing the all shadow pages reachable by the MMU. Oh and the
> > > > MMU-writable bit is never consulted on its own (e.g. We never iterate
> > > > through all SPTEs to find the ones that are !MMU-writable).
> > >
> > > Ah, you've missed the other wrinkle: MMU-writable can bet set iff Host-writable
> > > is set.  In other words, the MMU-writable bit is never left set because it can't
> > > be set if spte_can_locklessly_be_made_writable() returns false.
> 
> The changed_pte notifier looks like it clears Host-writable without
> clearing MMU-writable. Specifically the call chain:
> 
> kvm_mmu_notifier_change_pte()
>   kvm_set_spte_gfn()
>     kvm_tdp_mmu_set_spte_gfn()
>       set_spte_gfn()
>         kvm_mmu_changed_pte_notifier_make_spte()
> 
> Is there some guarantee that old_spte is !MMU-writable at this point?

Ugh, I misread that code, multiple times.  There's no guarantee, it was likely
just missed when MMU-writable was introduced.

Note, you literally cannot hit that code path in current kernels.  See commit
c13fda237f08 ("KVM: Assert that notifier count is elevated in .change_pte()").
So whatever you do is effectively untestable.

I really want to rip out .change_pte(), but I also don't want to do any performance
testing to justify removing the code instead of fixing it proper, so it's hung
around as a zombie...

> If not I could easily change kvm_mmu_changed_pte_notifier_make_spte()
> to also clear MMU-writable and preserve the invariant.

Yes, please.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection
  2022-01-13 17:10     ` David Matlack
@ 2022-01-13 22:40       ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-01-13 22:40 UTC (permalink / raw)
  To: David Matlack
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Gardon, kvm list

On Thu, Jan 13, 2022, David Matlack wrote:
> On Wed, Jan 12, 2022 at 4:46 PM Sean Christopherson <seanjc@google.com> wrote:
> > So something like this?  Plus more commentry in spte.h.
> >
> >         /*
> >          * It's safe to flush TLBs after dropping mmu_lock as making a writable
> >          * SPTE read-only for dirty logging only needs to ensure KVM starts
> >          * logging writes to the memslot before the memslot update completes,
> >          * i.e. before the enabling of dirty logging is visible to userspace.
> >          *
> >          * Note, KVM also write-protects SPTEs when shadowing guest page tables,
> >          * in which case a TLB flush is needed before dropping mmu_lock().  To
> >          * ensure a future TLB flush isn't missed, KVM uses a software-available
> >          * bit to track if a SPTE is MMU-Writable, i.e. is considered writable
> >          * for shadow paging purposes.  When write-protecting for shadow paging,
> >          * KVM clears both WRITABLE and MMU-Writable, and performs a TLB flush
> >          * while holding mmu_lock if either bit is cleared.
> >          *
> >          * See DEFAULT_SPTE_{HOST,MMU}_WRITEABLE for more details.
> >          */
> 
> Makes sense. I'll rework the comment per your feedback and also
> document the {host,mmu}-writable bits. Although I think it'd make more
> sense to put those comments on shadow_{host,mmu}_writable_mask as
> those are the symbols used throughout the code and EPT uses different
> bits than DEFAULT_..._WRITABLE.

I don't necessarily disagree, but all of the existing comments for SPTE bits are
in spte.h, even though the dynamic masks that are actually used in code are defined
elsewhere.  I'd prefer to keep all the "documentation" somewhat centralized, and it
shouldn't be too onerous to get from shadow_*_mask to DEFAULT_*_WRITABLE.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-13 22:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12 21:57 [PATCH 0/2] KVM: x86/mmu: Fix write-protection bug in the TDP MMU David Matlack
2022-01-12 21:58 ` [PATCH 1/2] KVM: x86/mmu: Fix write-protection of PTs mapped by " David Matlack
2022-01-12 23:14   ` Sean Christopherson
2022-01-12 23:57     ` David Matlack
2022-01-13  0:28       ` Sean Christopherson
2022-01-13 17:04         ` David Matlack
2022-01-13 18:28           ` David Matlack
2022-01-13 19:29             ` Sean Christopherson
2022-01-12 21:58 ` [PATCH 2/2] KVM: x86/mmu: Improve comment about TLB flush semantics for write-protection David Matlack
2022-01-13  0:46   ` Sean Christopherson
2022-01-13 17:10     ` David Matlack
2022-01-13 22:40       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).