All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers
@ 2022-04-29  1:04 Sean Christopherson
  2022-04-29  1:04 ` [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn() Sean Christopherson
                   ` (10 more replies)
  0 siblings, 11 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Clean up KVM's struct page / pfn helpers to reduce the number of
pfn_to_page() and page_to_pfn() conversions.  E.g. kvm_release_pfn_dirty()
makes 6 (if I counted right) calls to pfn_to_page() when releasing a dirty
pfn that backed by a vanilla struct page.  That is easily trimmed down to
a single call.

And perhaps more importantly, rename and refactor kvm_is_reserved_pfn() to
try and better reflect what it actually queries, which at this point is
effectively whether or not the pfn is backed by a refcounted page.

Sean Christopherson (10):
  KVM: Do not zero initialize 'pfn' in hva_to_pfn()
  KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn()
  KVM: Don't set Accessed/Dirty bits for ZERO_PAGE
  KVM: Avoid pfn_to_page() and vice versa when releasing pages
  KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page
  KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn
  KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()
  KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()
  KVM: Rename/refactor kvm_is_reserved_pfn() to
    kvm_pfn_to_refcounted_page()
  KVM: x86/mmu: Shove refcounted page dependency into
    host_pfn_mapping_level()

 arch/x86/kvm/mmu/mmu.c     |  26 +++++--
 arch/x86/kvm/mmu/tdp_mmu.c |   3 +-
 arch/x86/kvm/vmx/nested.c  |  39 ++++-------
 arch/x86/kvm/vmx/vmx.h     |   2 +-
 include/linux/kvm_host.h   |  12 +---
 virt/kvm/kvm_main.c        | 140 +++++++++++++++++++++++++------------
 6 files changed, 131 insertions(+), 91 deletions(-)


base-commit: 2a39d8b39bffdaf1a4223d0d22f07baee154c8f3
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-06-16 14:54   ` Paolo Bonzini
  2022-04-29  1:04 ` [PATCH 02/10] KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn() Sean Christopherson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Drop the unnecessary initialization of the local 'pfn' variable in
hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
return an "uninitializd" pfn, it would actually be interpeted as a legal
pfn by most callers.

Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
explicitly sets pfn to an error value (or returns an error value directly)
if a helper returns failure, and all helpers set the pfn on success.

Note, the zeroing of 'pfn' was introduced by commit 2fc843117d64 ("KVM:
reorganize hva_to_pfn"), and was unnecessary and misguided paranoia even
then.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0848430f36c6..04ed4334473c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2567,7 +2567,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 		     bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	kvm_pfn_t pfn;
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/10] KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
  2022-04-29  1:04 ` [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn() Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 03/10] KVM: Don't set Accessed/Dirty bits for ZERO_PAGE Sean Christopherson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Remove a check from kvm_release_pfn() to bail if the provided @pfn is
zero.  Zero is a perfectly valid pfn on most architectures, and should
not be used to indicate an error or an invalid pfn.  The bogus check was
added by commit 917248144db5 ("x86/kvm: Cache gfn to pfn translation"),
which also did the bad thing of zeroing the pfn and gfn to mark a cache
invalid.  Thankfully, that bad behavior was axed by commit 357a18ad230f
("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 04ed4334473c..154c3dda7010 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2723,9 +2723,6 @@ EXPORT_SYMBOL_GPL(gfn_to_page);
 
 void kvm_release_pfn(kvm_pfn_t pfn, bool dirty)
 {
-	if (pfn == 0)
-		return;
-
 	if (dirty)
 		kvm_release_pfn_dirty(pfn);
 	else
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/10] KVM: Don't set Accessed/Dirty bits for ZERO_PAGE
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
  2022-04-29  1:04 ` [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn() Sean Christopherson
  2022-04-29  1:04 ` [PATCH 02/10] KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn() Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages Sean Christopherson
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Don't set Accessed/Dirty bits for a struct page with PG_reserved set,
i.e. don't set A/D bits for the ZERO_PAGE.  The ZERO_PAGE (or pages
depending on the architecture) should obviously never be written, and
similarly there's no point in marking it accessed as the page will never
be swapped out or reclaimed.  The comment in page-flags.h is quite clear
that PG_reserved pages should be managed only by their owner, and
strictly following that mandate also simplifies KVM's logic.

Fixes: 7df003c85218 ("KVM: fix overflow of zero page refcount with ksm running")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 154c3dda7010..46d12998732e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2828,16 +2828,28 @@ void kvm_release_pfn_dirty(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
 
+static bool kvm_is_ad_tracked_pfn(kvm_pfn_t pfn)
+{
+	if (!pfn_valid(pfn))
+		return false;
+
+	/*
+	 * Per page-flags.h, pages tagged PG_reserved "should in general not be
+	 * touched (e.g. set dirty) except by its owner".
+	 */
+	return !PageReserved(pfn_to_page(pfn));
+}
+
 void kvm_set_pfn_dirty(kvm_pfn_t pfn)
 {
-	if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
+	if (kvm_is_ad_tracked_pfn(pfn))
 		SetPageDirty(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
 
 void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 {
-	if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn))
+	if (kvm_is_ad_tracked_pfn(pfn))
 		mark_page_accessed(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 03/10] KVM: Don't set Accessed/Dirty bits for ZERO_PAGE Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-06-16 15:00   ` Paolo Bonzini
  2022-04-29  1:04 ` [PATCH 05/10] KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page Sean Christopherson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Invert the order of KVM's page/pfn release helpers so that the "inner"
helper operates on a page instead of a pfn.  As pointed out by Linus[*],
converting between struct page and a pfn isn't necessarily cheap, and
that's not even counting the overhead of is_error_noslot_pfn() and
kvm_is_reserved_pfn().  Even if the checks were dirt cheap, there's no
reason to convert from a page to a pfn and back to a page, just to mark
the page dirty/accessed or to put a reference to the page.

Opportunistically drop a stale declaration of kvm_set_page_accessed()
from kvm_host.h (there was no implementation).

No functional change intended.

[*] https://lore.kernel.org/all/CAHk-=wifQimj2d6npq-wCi5onYPjzQg4vyO4tFcPJJZr268cRw@mail.gmail.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  1 -
 virt/kvm/kvm_main.c      | 58 +++++++++++++++++++++++++---------------
 2 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 252ee4a61b58..e32fbde79298 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1116,7 +1116,6 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
 				      bool *writable);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
-void kvm_set_page_accessed(struct page *page);
 
 kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 46d12998732e..ab7549195c68 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2798,18 +2798,40 @@ struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
+static bool kvm_is_ad_tracked_page(struct page *page)
+{
+	/*
+	 * Per page-flags.h, pages tagged PG_reserved "should in general not be
+	 * touched (e.g. set dirty) except by its owner".
+	 */
+	return !PageReserved(page);
+}
+
+static void kvm_set_page_dirty(struct page *page)
+{
+	if (kvm_is_ad_tracked_page(page))
+		SetPageDirty(page);
+}
+
+static void kvm_set_page_accessed(struct page *page)
+{
+	if (kvm_is_ad_tracked_page(page))
+		mark_page_accessed(page);
+}
+
 void kvm_release_page_clean(struct page *page)
 {
 	WARN_ON(is_error_page(page));
 
-	kvm_release_pfn_clean(page_to_pfn(page));
+	kvm_set_page_accessed(page);
+	put_page(page);
 }
 EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn)
 {
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
-		put_page(pfn_to_page(pfn));
+		kvm_release_page_clean(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
@@ -2817,40 +2839,34 @@ void kvm_release_page_dirty(struct page *page)
 {
 	WARN_ON(is_error_page(page));
 
-	kvm_release_pfn_dirty(page_to_pfn(page));
+	kvm_set_page_dirty(page);
+	kvm_release_page_clean(page);
 }
 EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
 
 void kvm_release_pfn_dirty(kvm_pfn_t pfn)
 {
-	kvm_set_pfn_dirty(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
+		kvm_release_page_dirty(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
 
-static bool kvm_is_ad_tracked_pfn(kvm_pfn_t pfn)
-{
-	if (!pfn_valid(pfn))
-		return false;
-
-	/*
-	 * Per page-flags.h, pages tagged PG_reserved "should in general not be
-	 * touched (e.g. set dirty) except by its owner".
-	 */
-	return !PageReserved(pfn_to_page(pfn));
-}
-
+/*
+ * Note, checking for an error/noslot pfn is the caller's responsibility when
+ * directly marking a page dirty/accessed.  Unlike the "release" helpers, the
+ * "set" helpers are not to be unused when the pfn might point at garbage.
+ */
 void kvm_set_pfn_dirty(kvm_pfn_t pfn)
 {
-	if (kvm_is_ad_tracked_pfn(pfn))
-		SetPageDirty(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		kvm_set_page_dirty(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
 
 void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 {
-	if (kvm_is_ad_tracked_pfn(pfn))
-		mark_page_accessed(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		kvm_set_page_accessed(pfn_to_page(pfn));
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/10] KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 06/10] KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn Sean Christopherson
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Use kvm_vcpu_map() to get/pin the backing for vmcs12's APIC-access page,
there's no reason it has to be restricted to 'struct page' backing.  The
APIC-access page actually doesn't need to be backed by anything, which is
ironically why it got left behind by the series which introduced
kvm_vcpu_map()[1]; the plan was to shove a dummy pfn into vmcs02[2], but
that code never got merged.

Switching the APIC-access page to kvm_vcpu_map() doesn't preclude using a
magic pfn in the future, and will allow a future patch to drop
kvm_vcpu_gpa_to_page().

[1] https://lore.kernel.org/all/1547026933-31226-1-git-send-email-karahmed@amazon.de
[2] https://lore.kernel.org/lkml/1543845551-4403-1-git-send-email-karahmed@amazon.de

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 39 ++++++++++++---------------------------
 arch/x86/kvm/vmx/vmx.h    |  2 +-
 2 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a6688663da4d..cc1c7836f172 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -311,11 +311,12 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.cached_vmcs12 = NULL;
 	kfree(vmx->nested.cached_shadow_vmcs12);
 	vmx->nested.cached_shadow_vmcs12 = NULL;
-	/* Unpin physical memory we referred to in the vmcs02 */
-	if (vmx->nested.apic_access_page) {
-		kvm_release_page_clean(vmx->nested.apic_access_page);
-		vmx->nested.apic_access_page = NULL;
-	}
+	/*
+	 * Unpin physical memory we referred to in the vmcs02.  The APIC access
+	 * page's backing page (yeah, confusing) shouldn't actually be accessed,
+	 * and if it is written, the contents are irrelevant.
+	 */
+	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
 	vmx->nested.pi_desc = NULL;
@@ -3159,8 +3160,6 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	struct kvm_host_map *map;
-	struct page *page;
-	u64 hpa;
 
 	if (!vcpu->arch.pdptrs_from_userspace &&
 	    !nested_cpu_has_ept(vmcs12) && is_pae_paging(vcpu)) {
@@ -3175,23 +3174,12 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
 
 
 	if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
-		/*
-		 * Translate L1 physical address to host physical
-		 * address for vmcs02. Keep the page pinned, so this
-		 * physical address remains valid. We keep a reference
-		 * to it so we can release it later.
-		 */
-		if (vmx->nested.apic_access_page) { /* shouldn't happen */
-			kvm_release_page_clean(vmx->nested.apic_access_page);
-			vmx->nested.apic_access_page = NULL;
-		}
-		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
-		if (!is_error_page(page)) {
-			vmx->nested.apic_access_page = page;
-			hpa = page_to_phys(vmx->nested.apic_access_page);
-			vmcs_write64(APIC_ACCESS_ADDR, hpa);
+		map = &vmx->nested.apic_access_page_map;
+
+		if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->apic_access_addr), map)) {
+			vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(map->pfn));
 		} else {
-			pr_debug_ratelimited("%s: no backing 'struct page' for APIC-access address in vmcs12\n",
+			pr_debug_ratelimited("%s: no backing for APIC-access address in vmcs12\n",
 					     __func__);
 			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 			vcpu->run->internal.suberror =
@@ -4627,10 +4615,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	}
 
 	/* Unpin physical memory we referred to in vmcs02 */
-	if (vmx->nested.apic_access_page) {
-		kvm_release_page_clean(vmx->nested.apic_access_page);
-		vmx->nested.apic_access_page = NULL;
-	}
+	kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
 	kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true);
 	vmx->nested.pi_desc = NULL;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9c6bfcd84008..2498774f36b2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -203,7 +203,7 @@ struct nested_vmx {
 	 * Guest pages referred to in the vmcs02 with host-physical
 	 * pointers, so we must keep them pinned while L2 runs.
 	 */
-	struct page *apic_access_page;
+	struct kvm_host_map apic_access_page_map;
 	struct kvm_host_map virtual_apic_map;
 	struct kvm_host_map pi_desc_map;
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/10] KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 05/10] KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 07/10] KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() Sean Christopherson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Drop a WARN_ON() if kvm_pfn_to_page() encounters a "reserved" pfn, which
in this context means a struct page that has PG_reserved but is not a/the
ZERO_PAGE and is not a ZONE_DEVICE page.  The usage, via gfn_to_page(),
in x86 is safe as gfn_to_page() is used only to retrieve a page from
KVM-controlled memslot, but the usage in PPC and s390 operates on
arbitrary gfns and thus memslots that can be backed by incompatible
memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ab7549195c68..a987188a426f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2703,10 +2703,8 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 	if (is_error_noslot_pfn(pfn))
 		return KVM_ERR_PTR_BAD_PAGE;
 
-	if (kvm_is_reserved_pfn(pfn)) {
-		WARN_ON(1);
+	if (kvm_is_reserved_pfn(pfn))
 		return KVM_ERR_PTR_BAD_PAGE;
-	}
 
 	return pfn_to_page(pfn);
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/10] KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 06/10] KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 08/10] KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() Sean Christopherson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Drop helpers to convert a gfn/gpa to a 'struct page' in the context of a
vCPU.  KVM doesn't require that guests be backed by 'struct page' memory,
thus any use of helpers that assume 'struct page' is bound to be flawed,
as was the case for the recently removed last user in x86's nested VMX.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h |  7 -------
 virt/kvm/kvm_main.c      | 35 +++++++++++++----------------------
 2 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e32fbde79298..7e59bc5ec8c7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1207,7 +1207,6 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
 kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
 kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
-struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn);
 void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty);
 unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
 unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable);
@@ -1695,12 +1694,6 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
 	return (hpa_t)pfn << PAGE_SHIFT;
 }
 
-static inline struct page *kvm_vcpu_gpa_to_page(struct kvm_vcpu *vcpu,
-						gpa_t gpa)
-{
-	return kvm_vcpu_gfn_to_page(vcpu, gpa_to_gfn(gpa));
-}
-
 static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
 {
 	unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a987188a426f..661390243b9e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2698,24 +2698,25 @@ int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
 
-static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
-{
-	if (is_error_noslot_pfn(pfn))
-		return KVM_ERR_PTR_BAD_PAGE;
-
-	if (kvm_is_reserved_pfn(pfn))
-		return KVM_ERR_PTR_BAD_PAGE;
-
-	return pfn_to_page(pfn);
-}
-
+/*
+ * Do not use this helper unless you are absolutely certain the gfn _must_ be
+ * backed by 'struct page'.  A valid example is if the backing memslot is
+ * controlled by KVM.  Note, if the returned page is valid, it's refcount has
+ * been elevated by gfn_to_pfn().
+ */
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
 	kvm_pfn_t pfn;
 
 	pfn = gfn_to_pfn(kvm, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	if (is_error_noslot_pfn(pfn))
+		return KVM_ERR_PTR_BAD_PAGE;
+
+	if (kvm_is_reserved_pfn(pfn))
+		return KVM_ERR_PTR_BAD_PAGE;
+
+	return pfn_to_page(pfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2786,16 +2787,6 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
-struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	kvm_pfn_t pfn;
-
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
-
-	return kvm_pfn_to_page(pfn);
-}
-EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
-
 static bool kvm_is_ad_tracked_page(struct page *page)
 {
 	/*
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/10] KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 07/10] KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 09/10] KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() Sean Christopherson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Operate on a 'struct page' instead of a pfn when checking if a page is a
ZONE_DEVICE page, and rename the helper accordingly.  Generally speaking,
KVM doesn't actually care about ZONE_DEVICE memory, i.e. shouldn't do
anything special for ZONE_DEVICE memory.  Rather, KVM wants to treat
ZONE_DEVICE memory like regular memory, and the need to identify
ZONE_DEVICE memory only arises as an exception to PG_reserved pages. In
other words, KVM should only ever check for ZONE_DEVICE memory after KVM
has already verified that there is a struct page associated with the pfn.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c   | 3 ++-
 include/linux/kvm_host.h | 2 +-
 virt/kvm/kvm_main.c      | 8 ++++----
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 904f0faff218..5cf1436adecd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2821,11 +2821,12 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 				  const struct kvm_memory_slot *slot)
 {
+	struct page *page = pfn_to_page(pfn);
 	unsigned long hva;
 	pte_t *pte;
 	int level;
 
-	if (!PageCompound(pfn_to_page(pfn)) && !kvm_is_zone_device_pfn(pfn))
+	if (!PageCompound(page) && !kvm_is_zone_device_page(page))
 		return PG_LEVEL_4K;
 
 	/*
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e59bc5ec8c7..4ccc309a43f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1548,7 +1548,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 
 bool kvm_is_reserved_pfn(kvm_pfn_t pfn);
-bool kvm_is_zone_device_pfn(kvm_pfn_t pfn);
+bool kvm_is_zone_device_page(struct page *page);
 
 struct kvm_irq_ack_notifier {
 	struct hlist_node link;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 661390243b9e..cbc6d58081d4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -164,7 +164,7 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 {
 }
 
-bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
+bool kvm_is_zone_device_page(struct page *page)
 {
 	/*
 	 * The metadata used by is_zone_device_page() to determine whether or
@@ -172,10 +172,10 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (WARN_ON_ONCE(!page_count(page)))
 		return false;
 
-	return is_zone_device_page(pfn_to_page(pfn));
+	return is_zone_device_page(page);
 }
 
 bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
@@ -188,7 +188,7 @@ bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
 	if (pfn_valid(pfn))
 		return PageReserved(pfn_to_page(pfn)) &&
 		       !is_zero_pfn(pfn) &&
-		       !kvm_is_zone_device_pfn(pfn);
+		       !kvm_is_zone_device_page(pfn_to_page(pfn));
 
 	return true;
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/10] KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 08/10] KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-04-29  1:04 ` [PATCH 10/10] KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level() Sean Christopherson
  2022-06-16 15:21 ` [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Paolo Bonzini
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Rename and refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
to better reflect what KVM is actually checking, and to eliminate extra
pfn_to_page() lookups.  The kvm_release_pfn_*() an kvm_try_get_pfn()
helpers in particular benefit from "refouncted" nomenclature, as it's not
all that obvious why KVM needs to get/put refcounts for some PG_reserved
pages (ZERO_PAGE and ZONE_DEVICE).

Add a comment to call out that the list of exceptions to PG_reserved is
all but guaranteed to be incomplete.  The list has mostly been compiled
by people throwing noodles at KVM and finding out they stick a little too
well, e.g. the ZERO_PAGE's refcount overflowed and ZONE_DEVICE pages
didn't get freed.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 15 +++++----
 arch/x86/kvm/mmu/tdp_mmu.c |  2 +-
 include/linux/kvm_host.h   |  2 +-
 virt/kvm/kvm_main.c        | 66 ++++++++++++++++++++++++++++++--------
 4 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5cf1436adecd..7da6741d6ea7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -587,6 +587,7 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 	kvm_pfn_t pfn;
 	u64 old_spte = *sptep;
 	int level = sptep_to_sp(sptep)->role.level;
+	struct page *page;
 
 	if (!spte_has_volatile_bits(old_spte))
 		__update_clear_spte_fast(sptep, 0ull);
@@ -601,11 +602,13 @@ static int mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep)
 	pfn = spte_to_pfn(old_spte);
 
 	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
+	 * KVM doesn't hold a reference to any pages mapped into the guest, and
+	 * instead uses the mmu_notifier to ensure that KVM unmaps any pages
+	 * before they are reclaimed.  Sanity check that, if the pfn is backed
+	 * by a refcounted page, the refcount is elevated.
 	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
+	page = kvm_pfn_to_refcounted_page(pfn);
+	WARN_ON(page && !page_count(page));
 
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
@@ -2877,7 +2880,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
 
-	if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
+	if (is_error_noslot_pfn(fault->pfn) || !kvm_pfn_to_refcounted_page(fault->pfn))
 		return;
 
 	if (kvm_slot_dirty_track_enabled(slot))
@@ -5947,7 +5950,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
+		if (sp->role.direct && kvm_pfn_to_refcounted_page(pfn) &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       pfn, PG_LEVEL_NUM)) {
 			pte_list_remove(kvm, rmap_head, sptep);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 566548a3efa7..de2cc963dbec 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1737,7 +1737,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		pfn = spte_to_pfn(iter.old_spte);
-		if (kvm_is_reserved_pfn(pfn) ||
+		if (!kvm_pfn_to_refcounted_page(pfn) ||
 		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
 							    pfn, PG_LEVEL_NUM))
 			continue;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4ccc309a43f2..9d5818b782f9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1547,7 +1547,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 
-bool kvm_is_reserved_pfn(kvm_pfn_t pfn);
+struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn);
 bool kvm_is_zone_device_page(struct page *page);
 
 struct kvm_irq_ack_notifier {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cbc6d58081d4..656c47037eea 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -178,19 +178,36 @@ bool kvm_is_zone_device_page(struct page *page)
 	return is_zone_device_page(page);
 }
 
-bool kvm_is_reserved_pfn(kvm_pfn_t pfn)
+/*
+ * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted
+ * page, NULL otherwise.  Note, the list of refcounted PG_reserved page types
+ * is likely incomplete, it has been compiled purely through people wanting to
+ * back guest with a certain type of memory and encountering issues.
+ */
+struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
 {
+	struct page *page;
+
+	if (!pfn_valid(pfn))
+		return NULL;
+
+	page = pfn_to_page(pfn);
+	if (!PageReserved(page))
+		return page;
+
+	/* The ZERO_PAGE(s) is marked PG_reserved, but is refcounted. */
+	if (is_zero_pfn(pfn))
+		return page;
+
 	/*
 	 * ZONE_DEVICE pages currently set PG_reserved, but from a refcounting
 	 * perspective they are "normal" pages, albeit with slightly different
 	 * usage rules.
 	 */
-	if (pfn_valid(pfn))
-		return PageReserved(pfn_to_page(pfn)) &&
-		       !is_zero_pfn(pfn) &&
-		       !kvm_is_zone_device_page(pfn_to_page(pfn));
+	if (kvm_is_zone_device_page(page))
+		return page;
 
-	return true;
+	return NULL;
 }
 
 /*
@@ -2479,9 +2496,12 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 
 static int kvm_try_get_pfn(kvm_pfn_t pfn)
 {
-	if (kvm_is_reserved_pfn(pfn))
+	struct page *page = kvm_pfn_to_refcounted_page(pfn);
+
+	if (!page)
 		return 1;
-	return get_page_unless_zero(pfn_to_page(pfn));
+
+	return get_page_unless_zero(page);
 }
 
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
@@ -2706,6 +2726,7 @@ EXPORT_SYMBOL_GPL(gfn_to_page_many_atomic);
  */
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
+	struct page *page;
 	kvm_pfn_t pfn;
 
 	pfn = gfn_to_pfn(kvm, gfn);
@@ -2713,10 +2734,11 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 	if (is_error_noslot_pfn(pfn))
 		return KVM_ERR_PTR_BAD_PAGE;
 
-	if (kvm_is_reserved_pfn(pfn))
+	page = kvm_pfn_to_refcounted_page(pfn);
+	if (!page)
 		return KVM_ERR_PTR_BAD_PAGE;
 
-	return pfn_to_page(pfn);
+	return page;
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2819,8 +2841,16 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn)
 {
-	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
-		kvm_release_page_clean(pfn_to_page(pfn));
+	struct page *page;
+
+	if (is_error_noslot_pfn(pfn))
+		return;
+
+	page = kvm_pfn_to_refcounted_page(pfn);
+	if (!page)
+		return;
+
+	kvm_release_page_clean(page);
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
 
@@ -2835,8 +2865,16 @@ EXPORT_SYMBOL_GPL(kvm_release_page_dirty);
 
 void kvm_release_pfn_dirty(kvm_pfn_t pfn)
 {
-	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
-		kvm_release_page_dirty(pfn_to_page(pfn));
+	struct page *page;
+
+	if (is_error_noslot_pfn(pfn))
+		return;
+
+	page = kvm_pfn_to_refcounted_page(pfn);
+	if (!page)
+		return;
+
+	kvm_release_page_dirty(page);
 }
 EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/10] KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level()
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (8 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 09/10] KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() Sean Christopherson
@ 2022-04-29  1:04 ` Sean Christopherson
  2022-06-16 15:21 ` [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Paolo Bonzini
  10 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-04-29  1:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel

Move the check that restricts mapping huge pages into the guest to pfns
that are backed by refcounted 'struct page' memory into the helper that
actually "requires" a 'struct page', host_pfn_mapping_level().  In
addition to deduplicating code, moving the check to the helper eliminates
the subtle requirement that the caller check that the incoming pfn is
backed by a refcounted struct page, and as an added bonus avoids an extra
pfn_to_page() lookup.

Note, the is_error_noslot_pfn() check in kvm_mmu_hugepage_adjust() needs
to stay where it is, as it guards against dereferencing a NULL memslot in
the kvm_slot_dirty_track_enabled() that follows.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 14 +++++++++++---
 arch/x86/kvm/mmu/tdp_mmu.c |  3 +--
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7da6741d6ea7..20c8f3cb6b4d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2824,11 +2824,19 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 				  const struct kvm_memory_slot *slot)
 {
-	struct page *page = pfn_to_page(pfn);
 	unsigned long hva;
+	struct page *page;
 	pte_t *pte;
 	int level;
 
+	/*
+	 * Note, @slot must be non-NULL, i.e. the caller is responsible for
+	 * ensuring @pfn isn't garbage and is backed by a memslot.
+	 */
+	page = kvm_pfn_to_refcounted_page(pfn);
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (!PageCompound(page) && !kvm_is_zone_device_page(page))
 		return PG_LEVEL_4K;
 
@@ -2880,7 +2888,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (unlikely(fault->max_level == PG_LEVEL_4K))
 		return;
 
-	if (is_error_noslot_pfn(fault->pfn) || !kvm_pfn_to_refcounted_page(fault->pfn))
+	if (is_error_noslot_pfn(fault->pfn))
 		return;
 
 	if (kvm_slot_dirty_track_enabled(slot))
@@ -5950,7 +5958,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * the guest, and the guest page table is using 4K page size
 		 * mapping if the indirect sp has level = 1.
 		 */
-		if (sp->role.direct && kvm_pfn_to_refcounted_page(pfn) &&
+		if (sp->role.direct &&
 		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
 							       pfn, PG_LEVEL_NUM)) {
 			pte_list_remove(kvm, rmap_head, sptep);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index de2cc963dbec..25efaf7da91f 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1737,8 +1737,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 			continue;
 
 		pfn = spte_to_pfn(iter.old_spte);
-		if (!kvm_pfn_to_refcounted_page(pfn) ||
-		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
+		if (iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
 							    pfn, PG_LEVEL_NUM))
 			continue;
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn()
  2022-04-29  1:04 ` [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn() Sean Christopherson
@ 2022-06-16 14:54   ` Paolo Bonzini
  2022-06-16 16:51     ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2022-06-16 14:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 4/29/22 03:04, Sean Christopherson wrote:
> Drop the unnecessary initialization of the local 'pfn' variable in
> hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
> perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
> return an "uninitializd" pfn, it would actually be interpeted as a legal
> pfn by most callers.
> 
> Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
> explicitly sets pfn to an error value (or returns an error value directly)
> if a helper returns failure, and all helpers set the pfn on success.
> 
> Note, the zeroing of 'pfn' was introduced by commit 2fc843117d64 ("KVM:
> reorganize hva_to_pfn"), and was unnecessary and misguided paranoia even
> then.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   virt/kvm/kvm_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0848430f36c6..04ed4334473c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2567,7 +2567,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>   		     bool write_fault, bool *writable)
>   {
>   	struct vm_area_struct *vma;
> -	kvm_pfn_t pfn = 0;
> +	kvm_pfn_t pfn;
>   	int npages, r;
>   
>   	/* we can do it either atomically or asynchronously, not both */

I wonder if it was needed to avoid uninitialized variable warnings on 
"return pfn;"...

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages
  2022-04-29  1:04 ` [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages Sean Christopherson
@ 2022-06-16 15:00   ` Paolo Bonzini
  2022-06-16 16:54     ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2022-06-16 15:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 4/29/22 03:04, Sean Christopherson wrote:
> -
> +/*
> + * Note, checking for an error/noslot pfn is the caller's responsibility when
> + * directly marking a page dirty/accessed.  Unlike the "release" helpers, the
> + * "set" helpers are not to be unused when the pfn might point at garbage.
> + */

s/unused/unused/

But while at it, I'd rather add a WARN_ON(is_error_noslot_pfn(pfn)).

Paolo

>   void kvm_set_pfn_dirty(kvm_pfn_t pfn)
>   {
> -	if (kvm_is_ad_tracked_pfn(pfn))
> -		SetPageDirty(pfn_to_page(pfn));
> +	if (pfn_valid(pfn))
> +		kvm_set_page_dirty(pfn_to_page(pfn));
>   }
>   EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
>   
>   void kvm_set_pfn_accessed(kvm_pfn_t pfn)
>   {
> -	if (kvm_is_ad_tracked_pfn(pfn))
> -		mark_page_accessed(pfn_to_page(pfn));
> +	if (pfn_valid(pfn))
> +		kvm_set_page_accessed(pfn_to_page(pfn));
>   }
>   EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers
  2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
                   ` (9 preceding siblings ...)
  2022-04-29  1:04 ` [PATCH 10/10] KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level() Sean Christopherson
@ 2022-06-16 15:21 ` Paolo Bonzini
  10 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2022-06-16 15:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On 4/29/22 03:04, Sean Christopherson wrote:
> Clean up KVM's struct page / pfn helpers to reduce the number of
> pfn_to_page() and page_to_pfn() conversions.  E.g. kvm_release_pfn_dirty()
> makes 6 (if I counted right) calls to pfn_to_page() when releasing a dirty
> pfn that backed by a vanilla struct page.  That is easily trimmed down to
> a single call.
> 
> And perhaps more importantly, rename and refactor kvm_is_reserved_pfn() to
> try and better reflect what it actually queries, which at this point is
> effectively whether or not the pfn is backed by a refcounted page.
> 
> Sean Christopherson (10):
>    KVM: Do not zero initialize 'pfn' in hva_to_pfn()
>    KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn()
>    KVM: Don't set Accessed/Dirty bits for ZERO_PAGE
>    KVM: Avoid pfn_to_page() and vice versa when releasing pages
>    KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page
>    KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn
>    KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()
>    KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()
>    KVM: Rename/refactor kvm_is_reserved_pfn() to
>      kvm_pfn_to_refcounted_page()
>    KVM: x86/mmu: Shove refcounted page dependency into
>      host_pfn_mapping_level()
> 
>   arch/x86/kvm/mmu/mmu.c     |  26 +++++--
>   arch/x86/kvm/mmu/tdp_mmu.c |   3 +-
>   arch/x86/kvm/vmx/nested.c  |  39 ++++-------
>   arch/x86/kvm/vmx/vmx.h     |   2 +-
>   include/linux/kvm_host.h   |  12 +---
>   virt/kvm/kvm_main.c        | 140 +++++++++++++++++++++++++------------
>   6 files changed, 131 insertions(+), 91 deletions(-)
> 
> 
> base-commit: 2a39d8b39bffdaf1a4223d0d22f07baee154c8f3

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn()
  2022-06-16 14:54   ` Paolo Bonzini
@ 2022-06-16 16:51     ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-06-16 16:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On Thu, Jun 16, 2022, Paolo Bonzini wrote:
> On 4/29/22 03:04, Sean Christopherson wrote:
> > Drop the unnecessary initialization of the local 'pfn' variable in
> > hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
> > perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
> > return an "uninitializd" pfn, it would actually be interpeted as a legal
> > pfn by most callers.
> > 
> > Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
> > explicitly sets pfn to an error value (or returns an error value directly)
> > if a helper returns failure, and all helpers set the pfn on success.
> > 
> > Note, the zeroing of 'pfn' was introduced by commit 2fc843117d64 ("KVM:
> > reorganize hva_to_pfn"), and was unnecessary and misguided paranoia even
> > then.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   virt/kvm/kvm_main.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 0848430f36c6..04ed4334473c 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2567,7 +2567,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> >   		     bool write_fault, bool *writable)
> >   {
> >   	struct vm_area_struct *vma;
> > -	kvm_pfn_t pfn = 0;
> > +	kvm_pfn_t pfn;
> >   	int npages, r;
> >   	/* we can do it either atomically or asynchronously, not both */
> 
> I wonder if it was needed to avoid uninitialized variable warnings on
> "return pfn;"...

That was my guess too, but IIRC I tried the old code with older compilers (gcc-7)
and couldn't trigger any warning.  So AFAICT, it was pure paranoia.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages
  2022-06-16 15:00   ` Paolo Bonzini
@ 2022-06-16 16:54     ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-06-16 16:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel

On Thu, Jun 16, 2022, Paolo Bonzini wrote:
> On 4/29/22 03:04, Sean Christopherson wrote:
> > -
> > +/*
> > + * Note, checking for an error/noslot pfn is the caller's responsibility when
> > + * directly marking a page dirty/accessed.  Unlike the "release" helpers, the
> > + * "set" helpers are not to be unused when the pfn might point at garbage.
> > + */
> 
> s/unused/unused/

LOL, s/unused/used?  :-)

> But while at it, I'd rather add a WARN_ON(is_error_noslot_pfn(pfn)).

I've no objection to that.  IIRC, I almost added it myself, but my mental coin
flip came up wrong.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-06-16 16:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29  1:04 [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Sean Christopherson
2022-04-29  1:04 ` [PATCH 01/10] KVM: Do not zero initialize 'pfn' in hva_to_pfn() Sean Christopherson
2022-06-16 14:54   ` Paolo Bonzini
2022-06-16 16:51     ` Sean Christopherson
2022-04-29  1:04 ` [PATCH 02/10] KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn() Sean Christopherson
2022-04-29  1:04 ` [PATCH 03/10] KVM: Don't set Accessed/Dirty bits for ZERO_PAGE Sean Christopherson
2022-04-29  1:04 ` [PATCH 04/10] KVM: Avoid pfn_to_page() and vice versa when releasing pages Sean Christopherson
2022-06-16 15:00   ` Paolo Bonzini
2022-06-16 16:54     ` Sean Christopherson
2022-04-29  1:04 ` [PATCH 05/10] KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page Sean Christopherson
2022-04-29  1:04 ` [PATCH 06/10] KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn Sean Christopherson
2022-04-29  1:04 ` [PATCH 07/10] KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() Sean Christopherson
2022-04-29  1:04 ` [PATCH 08/10] KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() Sean Christopherson
2022-04-29  1:04 ` [PATCH 09/10] KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() Sean Christopherson
2022-04-29  1:04 ` [PATCH 10/10] KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level() Sean Christopherson
2022-06-16 15:21 ` [PATCH 00/10] KVM: Clean up 'struct page' / pfn helpers Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.