All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>,
	Vipin Sharma <vipinsh@google.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Patch] KVM: x86/mmu: Make optimized __handle_changed_spte() for clear dirty log
Date: Thu, 26 Jan 2023 01:49:17 +0000	[thread overview]
Message-ID: <Y9HcHRBShQgjxsQb@google.com> (raw)
In-Reply-To: <CANgfPd9T7jdh1Cjjo4y6DcxC2poTaGhQ7wNLf6OgGtStg-yKJg@mail.gmail.com>

On Wed, Jan 25, 2023, Ben Gardon wrote:
> On Wed, Jan 25, 2023 at 2:00 PM David Matlack <dmatlack@google.com> wrote:
> >
> > On Wed, Jan 25, 2023 at 01:38:57PM -0800, Vipin Sharma wrote:
> > > Use a tone down version of __handle_changed_spte() when clearing dirty
> > > log. Remove checks which will not be needed when dirty logs are cleared.
> > >
> > > This change shows ~13% improvement in clear dirty log calls in
> > > dirty_log_perf_test
> > >
> > > Before tone down version:
> > > Clear dirty log over 3 iterations took 10.006764203s. (Avg 3.335588067s/iteration)
> > >
> > > After tone down version:
> > > Clear dirty log over 3 iterations took 8.686433554s. (Avg 2.895477851s/iteration)
> > >
> > > Signed-off-by: Vipin Sharma <vipinsh@google.com>
> > > ---
> > >  arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++++++++++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > > index bba33aea0fb0..ca21b33c4386 100644
> > > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > > @@ -504,6 +504,19 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> > >       call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
> > >  }
> > >
> > > +static void handle_changed_spte_clear_dirty_log(int as_id, gfn_t gfn,
> > > +                                             u64 old_spte, u64 new_spte,
> > > +                                             int level)
> > > +{
> > > +     if (old_spte == new_spte)
> > > +             return;
> > > +
> > > +     trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte);
> > > +
> > > +     if (is_dirty_spte(old_spte) &&  !is_dirty_spte(new_spte))
> > > +             kvm_set_pfn_dirty(spte_to_pfn(old_spte));
> > > +}
> > > +
> > >  /**
> > >   * __handle_changed_spte - handle bookkeeping associated with an SPTE change
> > >   * @kvm: kvm instance
> > > @@ -736,7 +749,12 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> > >
> > >       old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
> > >
> > > -     __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> > > +     if (record_dirty_log)
> > > +             __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte,
> > > +                                   level, false);
> > > +     else
> > > +             handle_changed_spte_clear_dirty_log(as_id, gfn, old_spte,
> > > +                                                 new_spte, level);
> >
> > I find it very non-intuitive to tie this behavior to !record_dirty_log,
> > even though it happens to work. It's also risky if any future calls are
> > added that pass record_dirty_log=false but change other bits in the
> > SPTE.
> >
> > I wonder if we could get the same effect by optimizing
> > __handle_changed_spte() to check for a cleared D-bit *first* and if
> > that's the only diff between the old and new SPTE, bail immediately
> > rather than doing all the other checks.
> 
> I would also prefer that course. One big lesson I took when building
> the TDP MMU was the value of having all the changed PTE handling go
> through one function. That's not always the right choice but the
> Shadow MMU has a crazy spaghetti code of different functions which
> handle different parts of responding to a PTE change and it makes the
> code very hard to read. I agree this path is worth optimizing, but the
> more we can keep the code together, the better.

Hrm, not sure I agree on that last point.  I find record_dirty_log and record_acc_track
to be eye sores and terribly confusing (I always forget when they're true/false).
I think we would end up with cleaner code overall if we special case clearing A/D
bits (or their write-protected/access-protect counterparts).

handle_changed_spte_dirty_log() takes effect if and only a SPTE becomes newly
writable, and that should never happen when aging gfns (record_acc_track=%false),
i.e. neither the "age gfns" nor the "clear dirty" paths need to call
handle_changed_spte_dirty_log(), which means @record_dirty_log is superfluous.  

Similarly, clearing dirty bits should never clear the accessed bit, nor should it
toggle PRESENT or change the PFN, i.e. neither path needs to call
handle_changed_spte_acc_track(), which means @record_acc_track is superfluous too.

If we jettison those, then AFAICT the only remaining heuristic is that
tdp_mmu_set_spte_atomic() doesn't update the dirty bitmaps (the comment about
that behavior is unhelpful and doesn't explain _why_).  That heuristic is easy to
handled by looking at @shared.

Looking through all the other stuff in __handle_changed_spte()...

I'm 100% comfortable skipping these sanity checks:

	if (was_leaf && is_leaf && pfn_changed)
		BUG();

	if (is_leaf)
		check_spte_writable_invariants(new_spte);

	if (!was_present && !is_present)
		WARN_ON(!MMIO && !REMOVED);

And none of these are relevant (again assuming we don't have an egregious bug)
except for the kvm_set_pfn_dirty() case, which is trivial to handle.

	if (is_leaf != was_leaf)
		kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1);

	if (was_leaf && is_dirty_spte(old_spte) &&
	    (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
		kvm_set_pfn_dirty(spte_to_pfn(old_spte));

	/*
	 * Recursively handle child PTs if the change removed a subtree from
	 * the paging structure.  Note the WARN on the PFN changing without the
	 * SPTE being converted to a hugepage (leaf) or being zapped.  Shadow
	 * pages are kernel allocations and should never be migrated.
	 */
	if (was_present && !was_leaf &&
	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);


So unless I'm missing something, I think we can end up with the below.  Compile
tested only...

---
 arch/x86/kvm/mmu/tdp_mmu.c | 92 ++++++++++----------------------------
 1 file changed, 24 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bba33aea0fb0..2f78ca43a276 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -505,7 +505,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
 }
 
 /**
- * __handle_changed_spte - handle bookkeeping associated with an SPTE change
+ * handle_changed_spte - handle bookkeeping associated with an SPTE change
  * @kvm: kvm instance
  * @as_id: the address space of the paging structure the SPTE was a part of
  * @gfn: the base GFN that was mapped by the SPTE
@@ -519,9 +519,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
  * Handle bookkeeping that might result from the modification of a SPTE.
  * This function must be called for all TDP SPTE modifications.
  */
-static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				  u64 old_spte, u64 new_spte, int level,
-				  bool shared)
+static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
+				u64 old_spte, u64 new_spte, int level,
+				bool shared)
 {
 	bool was_present = is_shadow_present_pte(old_spte);
 	bool is_present = is_shadow_present_pte(new_spte);
@@ -605,23 +605,18 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
 	if (was_present && !was_leaf &&
 	    (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
 		handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-}
 
-static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
-				u64 old_spte, u64 new_spte, int level,
-				bool shared)
-{
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
-			      shared);
 	handle_changed_spte_acc_track(old_spte, new_spte, level);
-	handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
-				      new_spte, level);
+
+	/* COMMENT GOES HERE. */
+	if (!shared)
+		handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
+					      new_spte, level);
 }
 
 /*
  * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically
- * and handle the associated bookkeeping.  Do not mark the page dirty
- * in KVM's dirty bitmaps.
+ * and handle the associated bookkeeping.
  *
  * If setting the SPTE fails because it has changed, iter->old_spte will be
  * refreshed to the current value of the spte.
@@ -658,9 +653,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
 	if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
 		return -EBUSY;
 
-	__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
-			      new_spte, iter->level, true);
-	handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
+	handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+			    new_spte, iter->level, true);
 
 	return 0;
 }
@@ -705,23 +699,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
  * @new_spte:	      The new value that will be set for the SPTE
  * @gfn:	      The base GFN that was (or will be) mapped by the SPTE
  * @level:	      The level _containing_ the SPTE (its parent PT's level)
- * @record_acc_track: Notify the MM subsystem of changes to the accessed state
- *		      of the page. Should be set unless handling an MMU
- *		      notifier for access tracking. Leaving record_acc_track
- *		      unset in that case prevents page accesses from being
- *		      double counted.
- * @record_dirty_log: Record the page as dirty in the dirty bitmap if
- *		      appropriate for the change being made. Should be set
- *		      unless performing certain dirty logging operations.
- *		      Leaving record_dirty_log unset in that case prevents page
- *		      writes from being double counted.
  *
  * Returns the old SPTE value, which _may_ be different than @old_spte if the
  * SPTE had voldatile bits.
  */
 static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
-			      u64 old_spte, u64 new_spte, gfn_t gfn, int level,
-			      bool record_acc_track, bool record_dirty_log)
+			      u64 old_spte, u64 new_spte, gfn_t gfn, int level)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -736,46 +719,18 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
 
 	old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
 
-	__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
-
-	if (record_acc_track)
-		handle_changed_spte_acc_track(old_spte, new_spte, level);
-	if (record_dirty_log)
-		handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
-					      new_spte, level);
+	handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
 	return old_spte;
 }
 
-static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
-				     u64 new_spte, bool record_acc_track,
-				     bool record_dirty_log)
+static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
+				     u64 new_spte)
 {
 	WARN_ON_ONCE(iter->yielded);
 
 	iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
 					    iter->old_spte, new_spte,
-					    iter->gfn, iter->level,
-					    record_acc_track, record_dirty_log);
-}
-
-static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
-				    u64 new_spte)
-{
-	_tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
-}
-
-static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
-						 struct tdp_iter *iter,
-						 u64 new_spte)
-{
-	_tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
-}
-
-static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
-						 struct tdp_iter *iter,
-						 u64 new_spte)
-{
-	_tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
+					    iter->gfn, iter->level);
 }
 
 #define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -925,7 +880,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return false;
 
 	__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
-			   sp->gfn, sp->role.level + 1, true, true);
+			   sp->gfn, sp->role.level + 1);
 
 	return true;
 }
@@ -1289,8 +1244,7 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
 		new_spte = mark_spte_for_access_track(new_spte);
 	}
 
-	tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
-
+	kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte, new_spte, iter->level);
 	return true;
 }
 
@@ -1326,7 +1280,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
 	 * Note, when changing a read-only SPTE, it's not strictly necessary to
 	 * zero the SPTE before setting the new PFN, but doing so preserves the
 	 * invariant that the PFN of a present * leaf SPTE can never change.
-	 * See __handle_changed_spte().
+	 * See handle_changed_spte().
 	 */
 	tdp_mmu_set_spte(kvm, iter, 0);
 
@@ -1351,7 +1305,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 	/*
 	 * No need to handle the remote TLB flush under RCU protection, the
 	 * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
-	 * shadow page.  See the WARN on pfn_changed in __handle_changed_spte().
+	 * shadow page.  See the WARN on pfn_changed in handle_changed_spte().
 	 */
 	return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
 }
@@ -1703,9 +1657,11 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
 				new_spte = iter.old_spte & ~shadow_dirty_mask;
 			else
 				continue;
+
+			kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
 		}
 
-		tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+		kvm_tdp_mmu_write_spte(iter.sptep, iter.old_spte, new_spte, iter.level);
 	}
 
 	rcu_read_unlock();

base-commit: f15a87c006901e02727bf8ac75b0251cdf8e0ecc
-- 


  parent reply	other threads:[~2023-01-26  1:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 21:38 [Patch] KVM: x86/mmu: Make optimized __handle_changed_spte() for clear dirty log Vipin Sharma
2023-01-25 22:00 ` David Matlack
2023-01-25 22:25   ` Ben Gardon
2023-01-26  0:40     ` Vipin Sharma
2023-01-26  1:49     ` Sean Christopherson [this message]
2023-01-28 17:20       ` Vipin Sharma
2023-01-30 18:09         ` Sean Christopherson
2023-01-30 20:17           ` Vipin Sharma
2023-01-30 22:49             ` Sean Christopherson
2023-01-30 22:12       ` David Matlack
2023-01-30 23:49         ` Sean Christopherson
2023-01-31 17:41           ` David Matlack
2023-01-31 18:01             ` Vipin Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9HcHRBShQgjxsQb@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.