linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Peter Shier <pshier@google.com>, Ben Gardon <bgardon@google.com>
Subject: [PATCH 1/1] kvm: mmu: zap pages when zapping only parent
Date: Mon, 27 Jul 2020 13:33:24 -0700	[thread overview]
Message-ID: <20200727203324.2614917-1-bgardon@google.com> (raw)

When the KVM MMU zaps a page, it will recursively zap the unsynced child
pages, but not the synced ones. This can create problems over time when
running many nested guests because it leaves unlinked pages which will not
be freed until the page quota is hit. With the default page quota of 20
shadow pages per 1000 guest pages, this looks like a memory leak and can
degrade MMU performance.

In a recent benchmark, substantial performance degradation was observed:
An L1 guest was booted with 64G memory.
2G nested Windows guests were booted, 10 at a time for 20
iterations. (200 total boots)
Windows was used in this benchmark because they touch all of their
memory on startup.
By the end of the benchmark, the nested guests were taking ~10% longer
to boot. With this patch there is no degradation in boot time.
Without this patch the benchmark ends with hundreds of thousands of
stale EPT02 pages cluttering up rmaps and the page hash map. As a
result, VM shutdown is also much slower: deleting memslot 0 was
observed to take over a minute. With this patch it takes just a
few miliseconds.

If TDP is enabled, zap child shadow pages when zapping the only parent
shadow page.

Tested by running the kvm-unit-tests suite on an Intel Haswell machine.
No regressions versus
commit c34b26b98cac ("KVM: MIPS: clean up redundant 'kvm_run' parameters"),
or warnings.

Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 49 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa506aaaf0194..c550bc3831dcc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2626,13 +2626,52 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 	return false;
 }
 
-static void kvm_mmu_page_unlink_children(struct kvm *kvm,
-					 struct kvm_mmu_page *sp)
+static int kvm_mmu_page_unlink_children(struct kvm *kvm,
+					struct kvm_mmu_page *sp,
+					struct list_head *invalid_list)
 {
 	unsigned i;
+	int zapped = 0;
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+		u64 *sptep = sp->spt + i;
+		u64 spte = *sptep;
+		struct kvm_mmu_page *child_sp;
+
+		/*
+		 * Zap the page table entry, unlinking any potential child
+		 * page
+		 */
+		mmu_page_zap_pte(kvm, sp, sptep);
+
+		/* If there is no child page for this spte, continue */
+		if (!is_shadow_present_pte(spte) ||
+		    is_last_spte(spte, sp->role.level))
+			continue;
+
+		/*
+		 * If TDP is enabled, then any shadow pages are part of either
+		 * the EPT01 or an EPT02. In either case, do not expect the
+		 * same pattern of page reuse seen in x86 PTs for
+		 * copy-on-write  and similar techniques. In this case, it is
+		 * unlikely that a parentless shadow PT will be used again in
+		 * the near future. Zap it to keep the rmaps and page hash
+		 * maps from filling up with stale EPT02 pages.
+		 */
+		if (!tdp_enabled)
+			continue;
+
+		child_sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
+		if (WARN_ON_ONCE(!child_sp))
+			continue;
+
+		/* Zap the page if it has no remaining parent pages */
+		if (!child_sp->parent_ptes.val)
+			zapped += kvm_mmu_prepare_zap_page(kvm, child_sp,
+							   invalid_list);
+	}
 
-	for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
-		mmu_page_zap_pte(kvm, sp, sp->spt + i);
+	return zapped;
 }
 
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -2678,7 +2717,7 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 	trace_kvm_mmu_prepare_zap_page(sp);
 	++kvm->stat.mmu_shadow_zapped;
 	*nr_zapped = mmu_zap_unsync_children(kvm, sp, invalid_list);
-	kvm_mmu_page_unlink_children(kvm, sp);
+	*nr_zapped += kvm_mmu_page_unlink_children(kvm, sp, invalid_list);
 	kvm_mmu_unlink_parents(kvm, sp);
 
 	/* Zapping children means active_mmu_pages has become unstable. */
-- 
2.28.0.rc0.142.g3c755180ce-goog


             reply	other threads:[~2020-07-27 20:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 20:33 Ben Gardon [this message]
2020-08-04 21:14 ` [PATCH 1/1] kvm: mmu: zap pages when zapping only parent Sean Christopherson
2020-08-05 17:10   ` Ben Gardon
2020-08-05 18:48     ` Paolo Bonzini
2020-08-08  5:06       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727203324.2614917-1-bgardon@google.com \
    --to=bgardon@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pshier@google.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).