linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 09/16] KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level
Date: Fri,  6 Dec 2019 15:57:22 -0800	[thread overview]
Message-ID: <20191206235729.29263-10-sean.j.christopherson@intel.com> (raw)
In-Reply-To: <20191206235729.29263-1-sean.j.christopherson@intel.com>

Persist the max page level calculated via gfn_lpage_is_disallowed() to
the max level "returned" by mapping_level() so that its naturally taken
into account by the max level check that conditions calling
transparent_hugepage_adjust().

Drop the gfn_lpage_is_disallowed() check in thp_adjust() as it's now
handled by mapping_level() and its callers.

Add a comment to document the behavior of host_mapping_level() and its
interaction with max level and transparent huge pages.

Note, transferring the gfn_lpage_is_disallowed() from thp_adjust() to
mapping_level() superficially affects how changes to a memslot's
disallow_lpage count will be handled due to thp_adjust() being run while
holding mmu_lock.

In the more common case where a different vCPU increments the count via
account_shadowed(), gfn_lpage_is_disallowed() is rechecked by set_spte()
to ensure a writable large page isn't created.

In the less common case where the count is decremented to zero due to
all shadow pages in the memslot being zapped, THP behavior now matches
hugetlbfs behavior in the sense that a small page will be created when a
large page could be used if the count reaches zero in the miniscule
window between mapping_level() and acquiring mmu_lock.

Lastly, the new THP behavior also follows hugetlbfs behavior in the
absurdly unlikely scenario of a memslot being moved such that the
memslot's compatibility with respect to large pages changes, but without
changing the validity of the gpf->pfn walk.  I.e. if a memslot is moved
between mapping_level() and snapshotting mmu_seq, it's theoretically
possible to consume a stale disallow_lpage count.  But, since KVM zaps
all shadow pages when moving a memslot and forces all vCPUs to reload a
new MMU, the inserted spte will always be thrown away prior to
completing the memslot move, i.e. whether or not the spte accurately
reflects disallow_lpage is irrelevant.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 877924cbb75b..8782a70abe78 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1326,7 +1326,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
 static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 			 int *max_levelp)
 {
-	int host_level, max_level = *max_levelp;
+	int max_level = *max_levelp;
 	struct kvm_memory_slot *slot;
 
 	if (unlikely(max_level == PT_PAGE_TABLE_LEVEL))
@@ -1338,18 +1338,27 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 		return PT_PAGE_TABLE_LEVEL;
 	}
 
-	host_level = host_mapping_level(vcpu->kvm, large_gfn);
-
-	if (host_level == PT_PAGE_TABLE_LEVEL)
-		return host_level;
-
-	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
+	max_level = min(max_level, kvm_x86_ops->get_lpage_level());
 	for ( ; max_level > PT_PAGE_TABLE_LEVEL; max_level--) {
 		if (!__mmu_gfn_lpage_is_disallowed(large_gfn, max_level, slot))
 			break;
 	}
 
-	return max_level;
+	*max_levelp = max_level;
+
+	if (max_level == PT_PAGE_TABLE_LEVEL)
+		return PT_PAGE_TABLE_LEVEL;
+
+	/*
+	 * Note, host_mapping_level() does *not* handle transparent huge pages.
+	 * As suggested by "mapping", it reflects the page size established by
+	 * the associated vma, if there is one, i.e. host_mapping_level() will
+	 * return a huge page level if and only if a vma exists and the backing
+	 * implementation for the vma uses huge pages, e.g. hugetlbfs and dax.
+	 * So, do not propagate host_mapping_level() to max_level as KVM can
+	 * still promote the guest mapping to a huge page in the THP case.
+	 */
+	return host_mapping_level(vcpu->kvm, large_gfn);
 }
 
 /*
@@ -3420,8 +3429,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
 	 */
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
 	    !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL &&
-	    PageTransCompoundMap(pfn_to_page(pfn)) &&
-	    !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
+	    PageTransCompoundMap(pfn_to_page(pfn))) {
 		unsigned long mask;
 		/*
 		 * mmu_notifier_retry was successful and we hold the
-- 
2.24.0


  parent reply	other threads:[~2019-12-06 23:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 23:57 [PATCH 00/16] KVM: x86: MMU page fault clean-up Sean Christopherson
2019-12-06 23:57 ` [PATCH 01/16] KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM Sean Christopherson
2019-12-06 23:57 ` [PATCH 02/16] KVM: x86/mmu: Move definition of make_mmu_pages_available() up Sean Christopherson
2019-12-06 23:57 ` [PATCH 03/16] KVM: x86/mmu: Fold nonpaging_map() into nonpaging_page_fault() Sean Christopherson
2019-12-06 23:57 ` [PATCH 04/16] KVM: x86/mmu: Move nonpaging_page_fault() below try_async_pf() Sean Christopherson
2019-12-06 23:57 ` [PATCH 05/16] KVM: x86/mmu: Refactor handling of cache consistency with TDP Sean Christopherson
2019-12-06 23:57 ` [PATCH 06/16] KVM: x86/mmu: Refactor the per-slot level calculation in mapping_level() Sean Christopherson
2019-12-06 23:57 ` [PATCH 07/16] KVM: x86/mmu: Refactor handling of forced 4k pages in page faults Sean Christopherson
2019-12-06 23:57 ` [PATCH 08/16] KVM: x86/mmu: Incorporate guest's page level into max level for shadow MMU Sean Christopherson
2019-12-06 23:57 ` Sean Christopherson [this message]
2019-12-06 23:57 ` [PATCH 10/16] KVM: x86/mmu: Rename lpage_disallowed to account_disallowed_nx_lpage Sean Christopherson
2019-12-06 23:57 ` [PATCH 11/16] KVM: x86/mmu: Consolidate tdp_page_fault() and nonpaging_page_fault() Sean Christopherson
2019-12-06 23:57 ` [PATCH 12/16] KVM: x86/mmu: Move transparent_hugepage_adjust() above __direct_map() Sean Christopherson
2019-12-06 23:57 ` [PATCH 13/16] KVM: x86/mmu: Move calls to thp_adjust() down a level Sean Christopherson
2019-12-06 23:57 ` [PATCH 14/16] KVM: x86/mmu: Move root_hpa validity checks to top of page fault handler Sean Christopherson
2019-12-06 23:57 ` [PATCH 15/16] KVM: x86/mmu: WARN on an invalid root_hpa Sean Christopherson
2019-12-06 23:57 ` [PATCH 16/16] KVM: x86/mmu: WARN if root_hpa is invalid when handling a page fault Sean Christopherson
2019-12-09 15:31 ` [PATCH 00/16] KVM: x86: MMU page fault clean-up Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191206235729.29263-10-sean.j.christopherson@intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).