kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Mingwei Zhang <mizhang@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH 2/2] KVM: mmu/x86: optimize zapping by retaining non-leaf SPTEs and avoid rcu stall
Date: Tue, 30 Nov 2021 19:40:45 +0000	[thread overview]
Message-ID: <YaZ+PSA9kOBYtpIz@google.com> (raw)
In-Reply-To: <YaV02MdGYlfGs35T@google.com>

On Tue, Nov 30, 2021, David Matlack wrote:
> > +		/*
> > +		 * In zap_all case, ignore the checking of present since we have
> > +		 * to zap everything.
> > +		 */
> > +		if (!zap_all && !is_shadow_present_pte(iter.old_spte))
> >  			continue;
> 
> I don't believe there's any reason to attempt to zap a non-present spte,
> even in the zap_all case. In any case, this change deserves its own
> patch and a commit message that describes why the old logic is incorrect
> and how this fixes it.

Yep, at best it's wasted cycles, at worst it will trigger WARN/BUG due to using
accessors that require the caller to check for a shadow present SPTE.

> >  		 * If this is a non-last-level SPTE that covers a larger range
> >  		 * than should be zapped, continue, and zap the mappings at a
> > -		 * lower level, except when zapping all SPTEs.
> > +		 * lower level. Actual zapping started at proper granularity
> > +		 * that is not so large as to cause a soft lockup when handling
> > +		 * the changed pte (which does not yield).
> >  		 */
> >  		if (!zap_all &&
> >  		    (iter.gfn < start ||
> > -		     iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
> > +		     iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end ||
> > +		     iter.level > PG_LEVEL_1G) &&
> >  		    !is_last_spte(iter.old_spte, iter.level))
> >  			continue;
> 
> This if statement is getting a bit long. I'd suggest breaking out the
> level check and also using KVM_MAX_HUGEPAGE_LEVEL.
> 
> e.g.
> 
>         /*
>          * If not doing zap_all, only zap up to the huge page level to
>          * avoid doing too much work in the recursive tdp_mmu_set_spte*
>          * call below, since it does not yield.
>          *
>          * This will potentially leave behind some childless page tables
>          * but that's ok because ...
>          */
>          if (!zap_all && iter.level > KVM_MAX_HUGEPAGE_LEVEL)
>                 continue;
> 
> And on that note, what is the reasoning for why it's ok to leave behind
> childless page tables? I assume it's because most of the time we'll use
> that page table again in the future, and at worst we leave the page
> table allocated until the VM is cleaned up?

Yes.  If zap_all=false, KVM is zapping a gfn range but not dropping or invalidating
the root.  The non-leaf paging structures are perfectly valid and can be reused.
There's certainly no guarantees that a structure will be reused, but keeping the
pages is ok because the memory consumed scales with the size of the VM, and the
number of shadow pages used for TDP is quite small, e.g. an order of magnitude less
than what's needed for shadow paging.

If we're ok waiting until my zap n' flush rework[*] lands, this is much easier to
handle, e.g. I think this will do it.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 62cb357b1dff..8d3df03024b7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -765,7 +765,7 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
                             bool shared, bool root_is_unreachable)
 {
        struct tdp_iter iter;
-
+       bool leafs_only;
        gfn_t end = tdp_mmu_max_gfn_host();
        gfn_t start = 0;

@@ -773,12 +773,11 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,

        rcu_read_lock();

-       /*
-        * No need to try to step down in the iterator when zapping an entire
-        * root, zapping an upper-level SPTE will recurse on its children.
-        */
+again:
+       /* Add comment here. */
        for_each_tdp_pte_min_level(iter, root->spt, root->role.level,
-                                  root->role.level, start, end) {
+                                  leafs_only ? PG_LEVEL_4K : root->role.level,
+                                  start, end) {
 retry:
                if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
                        continue;
@@ -786,6 +785,9 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
                if (!is_shadow_present_pte(iter.old_spte))
                        continue;

+               if (leafs_only && !is_last_spte(iter.old_spte, iter.level))
+                       continue;
+
                if (!shared) {
                        tdp_mmu_set_spte(kvm, &iter, 0);
                } else if (!tdp_mmu_set_spte_atomic(kvm, &iter, 0)) {
@@ -807,6 +809,11 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
                WARN_ON_ONCE(root_is_unreachable && root->role.invalid);
        }

+       if (leafs_only) {
+               leafs_only = false;
+               goto again;
+       }
+
        rcu_read_unlock();
 }

[*] https://lore.kernel.org/all/20211120045046.3940942-1-seanjc@google.com/

      parent reply	other threads:[~2021-11-30 19:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-24 21:44 [PATCH 0/2] optimize spte zapping in zap_gfn_range() Mingwei Zhang
2021-11-24 21:44 ` [PATCH 1/2] Revert "KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs" Mingwei Zhang
2021-11-30  0:50   ` David Matlack
2021-11-24 21:44 ` [PATCH 2/2] KVM: mmu/x86: optimize zapping by retaining non-leaf SPTEs and avoid rcu stall Mingwei Zhang
2021-11-30  0:48   ` David Matlack
2021-11-30  0:50     ` David Matlack
2021-11-30 19:40     ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YaZ+PSA9kOBYtpIz@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).