All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, dri-devel, intel-gfx, Vitaly Kuznetsov,
	kvmarm, linux-arm-kernel, Jim Mattson

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  3:57 ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
follow_pte in gfn_to_pfn. However, the resolved pfns may not have
assoicated struct pages, so they should not be passed to pfn_to_page.
This series removes such calls from the x86 and arm64 secondary MMU. To
do this, this series modifies gfn_to_pfn to return a struct page in
addition to a pfn, if the hva was resolved by gup. This allows the
caller to call put_page only when necessated by gup.

This series provides a helper function that unwraps the new return type
of gfn_to_pfn to provide behavior identical to the old behavior. As I
have no hardware to test powerpc/mips changes, the function is used
there for minimally invasive changes. Additionally, as gfn_to_page and
gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
easily changed over to only use pfns.

This addresses CVE-2021-22543 on x86 and arm64.

David Stevens (6):
  KVM: x86/mmu: release audited pfns
  KVM: mmu: also return page from gfn_to_pfn
  KVM: x86/mmu: avoid struct page in MMU
  KVM: arm64/mmu: avoid struct page in MMU
  KVM: mmu: remove over-aggressive warnings
  drm/i915/gvt: use gfn_to_pfn's page instead of pfn

 arch/arm64/kvm/mmu.c                   |  42 +++++----
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  60 ++++++------
 arch/x86/kvm/mmu/mmu_audit.c           |  13 ++-
 arch/x86/kvm/mmu/mmu_internal.h        |   3 +-
 arch/x86/kvm/mmu/paging_tmpl.h         |  36 +++++---
 arch/x86/kvm/mmu/tdp_mmu.c             |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.h             |   4 +-
 arch/x86/kvm/x86.c                     |   9 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  12 ++-
 drivers/gpu/drm/i915/gvt/hypercall.h   |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |  12 +--
 drivers/gpu/drm/i915/gvt/mpt.h         |   8 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 123 +++++++++++++------------
 21 files changed, 212 insertions(+), 174 deletions(-)

-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: release audited pfns
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	David Stevens, dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu_audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index cedc17b2f60e..97ff184084b4 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
 			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
 			     hpa, *sptep);
+
+	kvm_release_pfn_clean(pfn);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages == 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma == NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr == KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages == 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma == NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr == KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	David Stevens, dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages == 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma == NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr == KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages == 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma == NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr == KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages == 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma == NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r == -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr == KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Return a struct kvm_pfn_page containing both a pfn and an optional
struct page from the gfn_to_pfn family of functions. This differentiates
the gup and follow_fault_pfn cases, which allows callers that only need
a pfn to avoid touching the page struct in the latter case. For callers
that need a struct page, introduce a helper function that unwraps a
struct kvm_pfn_page into a struct page. This helper makes the call to
kvm_get_pfn which had previously been in hva_to_pfn_remapped.

For now, wrap all calls to gfn_to_pfn functions in the new helper
function. Callers which don't need the page struct will be updated in
follow-up patches.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c                   |   5 +-
 arch/mips/kvm/mmu.c                    |   3 +-
 arch/powerpc/kvm/book3s.c              |   3 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
 arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
 arch/x86/kvm/mmu/mmu.c                 |  11 ++-
 arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
 arch/x86/kvm/x86.c                     |   2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
 include/linux/kvm_host.h               |  27 ++++--
 include/linux/kvm_types.h              |   5 +
 virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
 14 files changed, 109 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c10207fed2f3..896b3644b36f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -933,8 +933,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
+						       NULL, write_fault,
+						       &writable, NULL));
 	if (pfn = KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 6d1f68cf4edf..f4e5e48bc6bf 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -630,7 +630,8 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, unsigned long gpa,
 	smp_rmb();
 
 	/* Slow path - ask KVM core whether we can access this GPA */
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writeable);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn_prot(kvm, gfn,
+						  write_fault, &writeable));
 	if (is_error_noslot_pfn(pfn)) {
 		err = -EFAULT;
 		goto out;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..2dff01d0632a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -417,7 +417,8 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 		return pfn;
 	}
 
-	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_prot(vcpu->kvm, gfn,
+						   writing, writable));
 }
 EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2d9193cd73be..ba094b9f87a9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,8 +590,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, &write_ok, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..e7892f148222 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -821,8 +821,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = kvm_pfn_page_unwrap(
+				__gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+						     writing, upgrade_p, NULL));
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..0eee0d98b055 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -890,7 +890,7 @@ static unsigned long kvmppc_share_page(struct kvm *kvm, unsigned long gpa,
 
 retry:
 	mutex_unlock(&kvm->arch.uvmem_lock);
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		goto out;
 
@@ -1075,7 +1075,7 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return -EFAULT;
 
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 7f16afc331ef..be7ca5788a05 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -446,7 +446,7 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		tsize_pages = 1UL << (tsize + 10 - PAGE_SHIFT);
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 		if (is_error_noslot_pfn(pfn)) {
 			if (printk_ratelimit())
 				pr_err("%s: real page not found for gfn %lx\n",
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d5876dfc6b7..84913677c404 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2619,7 +2619,7 @@ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return KVM_PFN_ERR_FAULT;
 
-	return gfn_to_pfn_memslot_atomic(slot, gfn);
+	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -3694,8 +3694,9 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	}
 
 	async = false;
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
+							&async, write,
+							writable, hva));
 	if (!async)
 		return false; /* *pfn has correct page already */
 
@@ -3709,8 +3710,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
-				    write, writable, hva);
+	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
+							write, writable, hva));
 	return false;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 97ff184084b4..3f983dc6e0f1 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -111,7 +111,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
+	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
 
 	if (is_error_pfn(pfn))
 		return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d425310054b..d31797e0cb6e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7341,7 +7341,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 65ff43cfc0f7..b829ff67e3d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1938,7 +1938,7 @@ static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = gfn_to_pfn(info->kvm, gfn);
+	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
 	if (is_error_noslot_pfn(pfn))
 		return INTEL_GVT_INVALID_ADDR;
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8583ed3ff344..d3731790a967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -138,6 +138,8 @@ static inline bool is_error_page(struct page *page)
 	return IS_ERR(page);
 }
 
+#define KVM_PFN_PAGE_ERR(e) ((struct kvm_pfn_page) { .pfn = (e), .page = NULL })
+
 #define KVM_REQUEST_MASK           GENMASK(7,0)
 #define KVM_REQUEST_NO_WAKEUP      BIT(8)
 #define KVM_REQUEST_WAIT           BIT(9)
@@ -795,14 +797,19 @@ void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable);
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva);
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable);
+struct kvm_pfn_page  gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					gfn_t gfn);
+struct kvm_pfn_page  gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					       gfn_t gfn);
+struct kvm_pfn_page  __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					  gfn_t gfn, bool atomic, bool *async,
+					  bool write_fault, bool *writable,
+					  hva_t *hva);
+
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -883,8 +890,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
 struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn);
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map);
 int kvm_map_gfn(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
 		struct gfn_to_pfn_cache *cache, bool atomic);
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..471f4b329f46 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -45,6 +45,11 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
+struct kvm_pfn_page {
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..898e90be4d0e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1982,9 +1982,8 @@ static inline int check_user_page_hwpoison(unsigned long addr)
  * only part that runs if we can in atomic context.
  */
 static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
-			    bool *writable, kvm_pfn_t *pfn)
+			    bool *writable, struct kvm_pfn_page *pfnpg)
 {
-	struct page *page[1];
 
 	/*
 	 * Fast pin a writable pfn only if it is a write fault request
@@ -1994,8 +1993,8 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 	if (!(write_fault || writable))
 		return false;
 
-	if (get_user_page_fast_only(addr, FOLL_WRITE, page)) {
-		*pfn = page_to_pfn(page[0]);
+	if (get_user_page_fast_only(addr, FOLL_WRITE, &pfnpg->page)) {
+		pfnpg->pfn = page_to_pfn(pfnpg->page);
 
 		if (writable)
 			*writable = true;
@@ -2010,10 +2009,9 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * 1 indicates success, -errno is returned if error is detected.
  */
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
-			   bool *writable, kvm_pfn_t *pfn)
+			   bool *writable, struct kvm_pfn_page *pfnpg)
 {
 	unsigned int flags = FOLL_HWPOISON;
-	struct page *page;
 	int npages = 0;
 
 	might_sleep();
@@ -2026,7 +2024,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 	if (async)
 		flags |= FOLL_NOWAIT;
 
-	npages = get_user_pages_unlocked(addr, 1, &page, flags);
+	npages = get_user_pages_unlocked(addr, 1, &pfnpg->page, flags);
 	if (npages != 1)
 		return npages;
 
@@ -2036,11 +2034,11 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 
 		if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) {
 			*writable = true;
-			put_page(page);
-			page = wpage;
+			put_page(pfnpg->page);
+			pfnpg->page = wpage;
 		}
 	}
-	*pfn = page_to_pfn(page);
+	pfnpg->pfn = page_to_pfn(pfnpg->page);
 	return npages;
 }
 
@@ -2058,7 +2056,7 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
-			       kvm_pfn_t *p_pfn)
+			       struct kvm_pfn_page *p_pfn)
 {
 	kvm_pfn_t pfn;
 	pte_t *ptep;
@@ -2094,22 +2092,12 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 		*writable = pte_write(*ptep);
 	pfn = pte_pfn(*ptep);
 
-	/*
-	 * Get a reference here because callers of *hva_to_pfn* and
-	 * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the
-	 * returned pfn.  This is only needed if the VMA has VM_MIXEDMAP
-	 * set, but the kvm_get_pfn/kvm_release_pfn_clean pair will
-	 * simply do nothing for reserved pfns.
-	 *
-	 * Whoever called remap_pfn_range is also going to call e.g.
-	 * unmap_mapping_range before the underlying pages are freed,
-	 * causing a call to our MMU notifier.
-	 */ 
-	kvm_get_pfn(pfn);
-
 out:
 	pte_unmap_unlock(ptep, ptl);
-	*p_pfn = pfn;
+
+	p_pfn->pfn = pfn;
+	p_pfn->page = NULL;
+
 	return 0;
 }
 
@@ -2127,30 +2115,30 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-			bool write_fault, bool *writable)
+static struct kvm_pfn_page hva_to_pfn(unsigned long addr, bool atomic,
+			bool *async, bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
-	kvm_pfn_t pfn = 0;
+	struct kvm_pfn_page pfnpg = {};
 	int npages, r;
 
 	/* we can do it either atomically or asynchronously, not both */
 	BUG_ON(atomic && async);
 
-	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
-		return pfn;
+	if (hva_to_pfn_fast(addr, write_fault, writable, &pfnpg))
+		return pfnpg;
 
 	if (atomic)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfnpg);
 	if (npages = 1)
-		return pfn;
+		return pfnpg;
 
 	mmap_read_lock(current->mm);
 	if (npages = -EHWPOISON ||
 	      (!async && check_user_page_hwpoison(addr))) {
-		pfn = KVM_PFN_ERR_HWPOISON;
+		pfnpg.pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
 
@@ -2158,26 +2146,27 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	vma = find_vma_intersection(current->mm, addr, addr + 1);
 
 	if (vma = NULL)
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
-		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfn);
+		r = hva_to_pfn_remapped(vma, addr, async, write_fault, writable, &pfnpg);
 		if (r = -EAGAIN)
 			goto retry;
 		if (r < 0)
-			pfn = KVM_PFN_ERR_FAULT;
+			pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	} else {
 		if (async && vma_is_valid(vma, write_fault))
 			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		pfnpg.pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
-	return pfn;
+	return pfnpg;
 }
 
-kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
-			       bool *writable, hva_t *hva)
+struct kvm_pfn_page __gfn_to_pfn_memslot(struct kvm_memory_slot *slot,
+					 gfn_t gfn, bool atomic, bool *async,
+					 bool write_fault, bool *writable,
+					 hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
 
@@ -2187,13 +2176,13 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (addr = KVM_HVA_ERR_RO_BAD) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_ERR_RO_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_RO_FAULT);
 	}
 
 	if (kvm_is_error_hva(addr)) {
 		if (writable)
 			*writable = false;
-		return KVM_PFN_NOSLOT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 	}
 
 	/* Do not map writable pfn in the readonly memslot. */
@@ -2207,44 +2196,55 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
-		      bool *writable)
+struct kvm_pfn_page gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,
+				    bool write_fault, bool *writable)
 {
 	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
-kvm_pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
-kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot,
+					      gfn_t gfn)
 {
 	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot_atomic(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn_atomic);
 
-kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+struct kvm_pfn_page gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
-kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+struct kvm_pfn_page kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	return gfn_to_pfn_memslot(kvm_vcpu_gfn_to_memslot(vcpu, gfn), gfn);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_pfn);
 
+kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
+{
+	if (pfnpg.page)
+		return pfnpg.pfn;
+
+	kvm_get_pfn(pfnpg.pfn);
+	return pfnpg.pfn;
+}
+EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);
+
 int gfn_to_page_many_atomic(struct kvm_memory_slot *slot, gfn_t gfn,
 			    struct page **pages, int nr_pages)
 {
@@ -2277,11 +2277,14 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn)
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
+
+	pfnpg = gfn_to_pfn(kvm, gfn);
 
-	pfn = gfn_to_pfn(kvm, gfn);
+	if (pfnpg.page)
+		return pfnpg.page;
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(gfn_to_page);
 
@@ -2304,7 +2307,7 @@ static void kvm_cache_gfn_to_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
 {
 	kvm_release_pfn(cache->pfn, cache->dirty, cache);
 
-	cache->pfn = gfn_to_pfn_memslot(slot, gfn);
+	cache->pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	cache->gfn = gfn;
 	cache->dirty = false;
 	cache->generation = gen;
@@ -2335,7 +2338,7 @@ static int __kvm_map_gfn(struct kvm_memslots *slots, gfn_t gfn,
 	} else {
 		if (atomic)
 			return -EAGAIN;
-		pfn = gfn_to_pfn_memslot(slot, gfn);
+		pfn = kvm_pfn_page_unwrap(gfn_to_pfn_memslot(slot, gfn));
 	}
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -2435,11 +2438,11 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap);
 
 struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
-	pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+	pfnpg = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
 
-	return kvm_pfn_to_page(pfn);
+	return kvm_pfn_to_page(kvm_pfn_page_unwrap(pfnpg));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_page);
 
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level == PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level == PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	David Stevens, dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level == PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level == PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level == PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++--------------
 arch/x86/kvm/mmu/mmu_audit.c    | 13 ++++----
 arch/x86/kvm/mmu/mmu_internal.h |  3 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 36 ++++++++++++---------
 arch/x86/kvm/mmu/tdp_mmu.c      |  7 +++--
 arch/x86/kvm/mmu/tdp_mmu.h      |  4 +--
 arch/x86/kvm/x86.c              |  9 +++---
 7 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84913677c404..8fa4a4a411ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2610,16 +2610,16 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	return ret;
 }
 
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
-				     bool no_dirty_log)
+static struct kvm_pfn_page pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu,
+						   gfn_t gfn, bool no_dirty_log)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
 	if (!slot)
-		return KVM_PFN_ERR_FAULT;
+		return KVM_PFN_PAGE_ERR(KVM_PFN_ERR_FAULT);
 
-	return kvm_pfn_page_unwrap(gfn_to_pfn_memslot_atomic(slot, gfn));
+	return gfn_to_pfn_memslot_atomic(slot, gfn);
 }
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
@@ -2748,7 +2748,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level)
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level)
 {
 	struct kvm_memory_slot *slot;
 	kvm_pfn_t pfn = *pfnp;
@@ -2760,6 +2761,9 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (unlikely(max_level = PG_LEVEL_4K))
 		return PG_LEVEL_4K;
 
+	if (!page)
+		return PG_LEVEL_4K;
+
 	if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
 		return PG_LEVEL_4K;
 
@@ -2814,7 +2818,8 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 }
 
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-			int map_writable, int max_level, kvm_pfn_t pfn,
+			int map_writable, int max_level,
+			const struct kvm_pfn_page *pfnpg,
 			bool prefault, bool is_tdp)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
@@ -2826,11 +2831,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	int level, req_level, ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
@@ -3672,8 +3678,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 }
 
 static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
-			 gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
-			 bool write, bool *writable)
+			 gpa_t cr2_or_gpa, struct kvm_pfn_page *pfnpg,
+			 hva_t *hva, bool write, bool *writable)
 {
 	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 	bool async;
@@ -3688,17 +3694,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	/* Don't expose private memslots to L2. */
 	if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
-		*pfn = KVM_PFN_NOSLOT;
+		*pfnpg = KVM_PFN_PAGE_ERR(KVM_PFN_NOSLOT);
 		*writable = false;
 		return false;
 	}
 
 	async = false;
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false,
-							&async, write,
-							writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+				      write, writable, hva);
 	if (!async)
-		return false; /* *pfn has correct page already */
+		return false; /* *pfnpg has correct page already */
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
@@ -3710,8 +3715,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 			return true;
 	}
 
-	*pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(slot, gfn, false, NULL,
-							write, writable, hva));
+	*pfnpg = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				      write, writable, hva);
 	return false;
 }
 
@@ -3723,7 +3728,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	unsigned long mmu_seq;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	int r;
 
@@ -3743,11 +3748,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, gfn, gpa, &pfnpg, &hva,
 			 write, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+	if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa,
+				gfn, pfnpg.pfn, ACC_ALL, &r))
 		return r;
 
 	r = RET_PF_RETRY;
@@ -3757,7 +3763,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -3765,17 +3772,18 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
-				    pfn, prefault);
+				    &pfnpg, prefault);
 	else
-		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
-				 prefault, is_tdp);
+		r = __direct_map(vcpu, gpa, error_code, map_writable, max_level,
+				 &pfnpg, prefault, is_tdp);
 
 out_unlock:
 	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 3f983dc6e0f1..72b470b892da 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -94,7 +94,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 {
 	struct kvm_mmu_page *sp;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hpa_t hpa;
 
 	sp = sptep_to_sp(sptep);
@@ -111,18 +111,19 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
 		return;
 
 	gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-	pfn = kvm_pfn_page_unwrap(kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn));
+	pfnpg = kvm_vcpu_gfn_to_pfn_atomic(vcpu, gfn);
 
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return;
 
-	hpa =  pfn << PAGE_SHIFT;
+	hpa =  pfnpg.pfn << PAGE_SHIFT;
 	if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
 		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
-			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
+			     "ent %llxn", vcpu->arch.mmu->root_level, pfnpg.pfn,
 			     hpa, *sptep);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 }
 
 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d64ccb417c60..db4d878fde4e 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -154,7 +154,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
-			    bool huge_page_disallowed, int *req_level);
+			    struct page *page, bool huge_page_disallowed,
+			    int *req_level);
 void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
 				kvm_pfn_t *pfnp, int *goal_levelp);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..db13efd4b62d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -535,7 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 {
 	unsigned pte_access;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -545,19 +545,20 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	gfn = gpte_to_gfn(gpte);
 	pte_access = sp->role.access & FNAME(gpte_access)(gpte);
 	FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
-	pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+	pfnpg = pte_prefetch_gfn_to_pfn(vcpu, gfn,
 			no_dirty_log && (pte_access & ACC_WRITE_MASK));
-	if (is_error_pfn(pfn))
+	if (is_error_pfn(pfnpg.pfn))
 		return false;
 
 	/*
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+	mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfnpg.pfn,
 		     true, true);
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return true;
 }
 
@@ -637,8 +638,8 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
  */
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 			 struct guest_walker *gw, u32 error_code,
-			 int max_level, kvm_pfn_t pfn, bool map_writable,
-			 bool prefault)
+			 int max_level, struct kvm_pfn_page *pfnpg,
+			 bool map_writable, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write_fault = error_code & PFERR_WRITE_MASK;
@@ -649,6 +650,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	unsigned int direct_access, access;
 	int top_level, level, req_level, ret;
 	gfn_t base_gfn = gw->gfn;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	direct_access = gw->pte_access;
 
@@ -695,7 +697,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
 	}
 
 	level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
-					huge_page_disallowed, &req_level);
+					pfnpg->page, huge_page_disallowed,
+					&req_level);
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
@@ -801,7 +804,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	bool user_fault = error_code & PFERR_USER_MASK;
 	struct guest_walker walker;
 	int r;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	hva_t hva;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
@@ -853,11 +856,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	smp_rmb();
 
-	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
+	if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfnpg, &hva,
 			 write_fault, &map_writable))
 		return RET_PF_RETRY;
 
-	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfnpg.pfn,
+				walker.pte_access, &r))
 		return r;
 
 	/*
@@ -866,7 +870,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 	 */
 	if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
 	     !is_write_protection(vcpu) && !user_fault &&
-	      !is_noslot_pfn(pfn)) {
+	      !is_noslot_pfn(pfnpg.pfn)) {
 		walker.pte_access |= ACC_WRITE_MASK;
 		walker.pte_access &= ~ACC_USER_MASK;
 
@@ -882,20 +886,22 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
 
 	r = RET_PF_RETRY;
 	write_lock(&vcpu->kvm->mmu_lock);
-	if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+	if (!is_noslot_pfn(pfnpg.pfn) &&
+	    mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
 		goto out_unlock;
 
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	r = make_mmu_pages_available(vcpu);
 	if (r)
 		goto out_unlock;
-	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+	r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, &pfnpg,
 			 map_writable, prefault);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
 	write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 	return r;
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 237317b1eddd..b0e6d63f0fe1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -960,8 +960,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
  * page tables and SPTEs to translate the faulting guest physical address.
  */
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault)
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault)
 {
 	bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
 	bool write = error_code & PFERR_WRITE_MASK;
@@ -976,13 +976,14 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	int level;
 	int req_level;
+	kvm_pfn_t pfn = pfnpg->pfn;
 
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 	if (WARN_ON(!is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa)))
 		return RET_PF_RETRY;
 
-	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+	level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, pfnpg->page,
 					huge_page_disallowed, &req_level);
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..f78681b9dcb7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -52,8 +52,8 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
 
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
-		    int map_writable, int max_level, kvm_pfn_t pfn,
-		    bool prefault);
+		    int map_writable, int max_level,
+		    const struct kvm_pfn_page *pfnpg, bool prefault);
 
 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
 				 bool flush);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d31797e0cb6e..86d66c765190 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7311,7 +7311,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  int emulation_type)
 {
 	gpa_t gpa = cr2_or_gpa;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 
 	if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF))
 		return false;
@@ -7341,16 +7341,17 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)));
+	pfnpg = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
 
 	/*
 	 * If the instruction failed on the error pfn, it can not be fixed,
 	 * report the error to userspace.
 	 */
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return false;
 
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page)
+		put_page(pfnpg.page);
 
 	/* The instructions are well-emulated on direct mmu. */
 	if (vcpu->arch.mmu->direct_map) {
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	David Stevens, dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Avoid converting pfns returned by follow_fault_pfn to struct pages to
transiently take a reference. The reference was originally taken to
match the reference taken by gup. However, pfns returned by
follow_fault_pfn may not have a struct page set up for reference
counting.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 896b3644b36f..a741972cb75f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -779,17 +779,17 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static unsigned long
 transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
+			    unsigned long hva, struct kvm_pfn_page *pfnpgp,
 			    phys_addr_t *ipap)
 {
-	kvm_pfn_t pfn = *pfnp;
+	kvm_pfn_t pfn = pfnpgp->pfn;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (kvm_is_transparent_hugepage(pfn) &&
+	if (pfnpgp->page && kvm_is_transparent_hugepage(pfn) &&
 	    fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
 		/*
 		 * The address we faulted on is backed by a transparent huge
@@ -810,10 +810,11 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
 		 * page accordingly.
 		 */
 		*ipap &= PMD_MASK;
-		kvm_release_pfn_clean(pfn);
+		put_page(pfnpgp->page);
 		pfn &= ~(PTRS_PER_PMD - 1);
-		kvm_get_pfn(pfn);
-		*pfnp = pfn;
+		pfnpgp->pfn = pfn;
+		pfnpgp->page = pfn_to_page(pfnpgp->pfn);
+		get_page(pfnpgp->page);
 
 		return PMD_SIZE;
 	}
@@ -836,7 +837,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	gfn_t gfn;
-	kvm_pfn_t pfn;
+	struct kvm_pfn_page pfnpg;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
@@ -933,17 +934,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = kvm_pfn_page_unwrap(__gfn_to_pfn_memslot(memslot, gfn, false,
-						       NULL, write_fault,
-						       &writable, NULL));
-	if (pfn = KVM_PFN_ERR_HWPOISON) {
+	pfnpg = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+				     write_fault, &writable, NULL);
+	if (pfnpg.pfn = KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfnpg.pfn))
 		return -EFAULT;
 
-	if (kvm_is_device_pfn(pfn)) {
+	if (kvm_is_device_pfn(pfnpg.pfn)) {
 		device = true;
 		force_pte = true;
 	} else if (logging_active && !write_fault) {
@@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	if (vma_pagesize = PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
-							   &pfn, &fault_ipa);
+							   &pfnpg, &fault_ipa);
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
 	if (fault_status != FSC_PERM && !device)
-		clean_dcache_guest_page(pfn, vma_pagesize);
+		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
 
 	if (exec_fault) {
 		prot |= KVM_PGTABLE_PROT_X;
-		invalidate_icache_guest_page(pfn, vma_pagesize);
+		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);
 	}
 
 	if (device)
@@ -994,20 +994,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+					     __pfn_to_phys(pfnpg.pfn), prot,
 					     memcache);
 	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-		kvm_set_pfn_dirty(pfn);
+		if (pfnpg.page)
+			kvm_set_pfn_dirty(pfnpg.pfn);
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 	}
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
-	kvm_set_pfn_accessed(pfn);
-	kvm_release_pfn_clean(pfn);
+	if (pfnpg.page) {
+		kvm_set_pfn_accessed(pfnpg.pfn);
+		put_page(pfnpg.page);
+	}
 	return ret != -EAGAIN ? ret : 0;
 }
 
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	David Stevens, dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, David Stevens, James Morse, dri-devel,
	intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 5/6] KVM: mmu: remove over-aggressive warnings
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

From: David Stevens <stevensd@chromium.org>

Remove two warnings that require ref counts for pages to be non-zero, as
mapped pfns from follow_pfn may not have an initialized ref count.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 arch/x86/kvm/mmu/mmu.c | 7 -------
 virt/kvm/kvm_main.c    | 2 +-
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8fa4a4a411ba..19249ad4b5b8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -546,13 +546,6 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 
 	pfn = spte_to_pfn(old_spte);
 
-	/*
-	 * KVM does not hold the refcount of the page used by
-	 * kvm mmu, before reclaiming the page, we should
-	 * unmap it from mmu first.
-	 */
-	WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
-
 	if (is_accessed_spte(old_spte))
 		kvm_set_pfn_accessed(pfn);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 898e90be4d0e..671361f30476 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -168,7 +168,7 @@ bool kvm_is_zone_device_pfn(kvm_pfn_t pfn)
 	 * the device has been pinned, e.g. by get_user_pages().  WARN if the
 	 * page_count() is zero to help detect bad usage of this helper.
 	 */
-	if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn))))
+	if (!pfn_valid(pfn) || !page_count(pfn_to_page(pfn)))
 		return false;
 
 	return is_zone_device_page(pfn_to_page(pfn));
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  3:57   ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn == INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn == INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, dri-devel, intel-gfx, Vitaly Kuznetsov,
	kvmarm, linux-arm-kernel, Jim Mattson

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn == INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn == INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn == INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  3:57   ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  3:57 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

Return struct page instead of pfn from gfn_to_mfn. This function is only
used to determine if the page is a transparent hugepage, to enable 2MB
huge gtt shadowing. Returning the page directly avoids the risk of
calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.

This change also properly releases the reference on the page returned by
gfn_to_pfn.

Signed-off-by: David Stevens <stevensd@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
 drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
 drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9478c132d7b6..b2951c560582 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	unsigned long pfn;
+	struct page *page;
+	bool is_trans_huge;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
-	if (pfn = INTEL_GVT_INVALID_ADDR)
+	page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
+	if (!page)
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	is_trans_huge = PageTransHuge(page);
+	put_page(page);
+
+	return is_trans_huge;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
index b79da5124f83..017190ff52d5 100644
--- a/drivers/gpu/drm/i915/gvt/hypercall.h
+++ b/drivers/gpu/drm/i915/gvt/hypercall.h
@@ -60,7 +60,8 @@ struct intel_gvt_mpt {
 			unsigned long len);
 	int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
 			 unsigned long len);
-	unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
+	struct page *(*gfn_to_mfn_page)(unsigned long handle,
+					unsigned long gfn);
 
 	int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
 				  unsigned long size, dma_addr_t *dma_addr);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b829ff67e3d9..1e97ae813ed0 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
 	return -EFAULT;
 }
 
-static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
+static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
 {
 	struct kvmgt_guest_info *info;
 	kvm_pfn_t pfn;
 
 	if (!handle_valid(handle))
-		return INTEL_GVT_INVALID_ADDR;
+		return NULL;
 
 	info = (struct kvmgt_guest_info *)handle;
 
-	pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
-	if (is_error_noslot_pfn(pfn))
-		return INTEL_GVT_INVALID_ADDR;
-
-	return pfn;
+	return gfn_to_pfn(info->kvm, gfn).page;
 }
 
 static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
@@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
 	.disable_page_track = kvmgt_page_track_remove,
 	.read_gpa = kvmgt_read_gpa,
 	.write_gpa = kvmgt_write_gpa,
-	.gfn_to_mfn = kvmgt_gfn_to_pfn,
+	.gfn_to_mfn_page = kvmgt_gfn_to_page,
 	.dma_map_guest_page = kvmgt_dma_map_guest_page,
 	.dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
 	.dma_pin_guest_page = kvmgt_dma_pin_guest_page,
diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
index 550a456e936f..9169b83cf0f6 100644
--- a/drivers/gpu/drm/i915/gvt/mpt.h
+++ b/drivers/gpu/drm/i915/gvt/mpt.h
@@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
 }
 
 /**
- * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
+ * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
  * @vgpu: a vGPU
  * @gpfn: guest pfn
  *
  * Returns:
- * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
+ * struct page* on success, NULL if failed.
  */
-static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
+static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
 		struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
+	return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
 }
 
 /**
-- 
2.32.0.93.g670b81a890-goog

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24  3:57 ` David Stevens
                   ` (10 preceding siblings ...)
  (?)
@ 2021-06-24  4:19 ` Patchwork
  -1 siblings, 0 replies; 213+ messages in thread
From: Patchwork @ 2021-06-24  4:19 UTC (permalink / raw)
  To: David Stevens; +Cc: intel-gfx

== Series Details ==

Series: KVM: Remove uses of struct page from x86 and arm64 MMU
URL   : https://patchwork.freedesktop.org/series/91836/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND objtool
  CHK     include/generated/compile.h
  LD [M]  drivers/gpu/drm/i915/i915.o
  HDRTEST drivers/gpu/drm/i915/gvt/gvt.h
In file included from ./drivers/gpu/drm/i915/gvt/gvt.h:734,
                 from <command-line>:
./drivers/gpu/drm/i915/gvt/mpt.h: In function ‘intel_gvt_hypervisor_gfn_to_mfn_page’:
./drivers/gpu/drm/i915/gvt/mpt.h:227:9: error: returning ‘struct page *’ from a function with return type ‘long unsigned int’ makes integer from pointer without a cast [-Werror=int-conversion]
  return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
drivers/gpu/drm/i915/Makefile:319: recipe for target 'drivers/gpu/drm/i915/gvt/gvt.hdrtest' failed
make[4]: *** [drivers/gpu/drm/i915/gvt/gvt.hdrtest] Error 1
scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:515: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:515: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1847: recipe for target 'drivers' failed
make: *** [drivers] Error 2


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
  2021-06-24  3:57   ` David Stevens
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  4:28     ` David Stevens
  -1 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, open list, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx,
	ML dri-devel, David Stevens

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn == INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  4:28     ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, open list, kvm-ppc, linux-mips, James Morse,
	ML dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei,
	kvmarm, linux-arm-kernel, Jim Mattson

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn == INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  4:28     ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, open list,
	kvm-ppc, linux-mips, ML dri-devel, intel-gfx, Vitaly Kuznetsov,
	kvmarm, linux-arm-kernel, Jim Mattson

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn == INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  4:28     ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, open list, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx,
	ML dri-devel, David Stevens

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn == INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  4:28     ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, open list, kvm-ppc, linux-mips, James Morse,
	ML dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei,
	kvmarm, linux-arm-kernel, Jim Mattson

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn == INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
@ 2021-06-24  4:28     ` David Stevens
  0 siblings, 0 replies; 213+ messages in thread
From: David Stevens @ 2021-06-24  4:28 UTC (permalink / raw)
  To: Marc Zyngier, Huacai Chen, Aleksandar Markovic, Paul Mackerras,
	Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, open list, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx,
	ML dri-devel, David Stevens

Please ignore this last patch. It was put together as an afterthought
and wasn't properly tested.

-David

On Thu, Jun 24, 2021 at 12:59 PM David Stevens <stevensd@chromium.org> wrote:
>
> Return struct page instead of pfn from gfn_to_mfn. This function is only
> used to determine if the page is a transparent hugepage, to enable 2MB
> huge gtt shadowing. Returning the page directly avoids the risk of
> calling pfn_to_page on a VM_IO|VM_PFNMAP pfn.
>
> This change also properly releases the reference on the page returned by
> gfn_to_pfn.
>
> Signed-off-by: David Stevens <stevensd@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c       | 12 ++++++++----
>  drivers/gpu/drm/i915/gvt/hypercall.h |  3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c     | 12 ++++--------
>  drivers/gpu/drm/i915/gvt/mpt.h       |  8 ++++----
>  4 files changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 9478c132d7b6..b2951c560582 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,16 +1160,20 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>         struct intel_gvt_gtt_entry *entry)
>  {
>         struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -       unsigned long pfn;
> +       struct page *page;
> +       bool is_trans_huge;
>
>         if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>                 return 0;
>
> -       pfn = intel_gvt_hypervisor_gfn_to_mfn(vgpu, ops->get_pfn(entry));
> -       if (pfn = INTEL_GVT_INVALID_ADDR)
> +       page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
> +       if (!page)
>                 return -EINVAL;
>
> -       return PageTransHuge(pfn_to_page(pfn));
> +       is_trans_huge = PageTransHuge(page);
> +       put_page(page);
> +
> +       return is_trans_huge;
>  }
>
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> diff --git a/drivers/gpu/drm/i915/gvt/hypercall.h b/drivers/gpu/drm/i915/gvt/hypercall.h
> index b79da5124f83..017190ff52d5 100644
> --- a/drivers/gpu/drm/i915/gvt/hypercall.h
> +++ b/drivers/gpu/drm/i915/gvt/hypercall.h
> @@ -60,7 +60,8 @@ struct intel_gvt_mpt {
>                         unsigned long len);
>         int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
>                          unsigned long len);
> -       unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);
> +       struct page *(*gfn_to_mfn_page)(unsigned long handle,
> +                                       unsigned long gfn);
>
>         int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
>                                   unsigned long size, dma_addr_t *dma_addr);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b829ff67e3d9..1e97ae813ed0 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1928,21 +1928,17 @@ static int kvmgt_inject_msi(unsigned long handle, u32 addr, u16 data)
>         return -EFAULT;
>  }
>
> -static unsigned long kvmgt_gfn_to_pfn(unsigned long handle, unsigned long gfn)
> +static struct page *kvmgt_gfn_to_page(unsigned long handle, unsigned long gfn)
>  {
>         struct kvmgt_guest_info *info;
>         kvm_pfn_t pfn;
>
>         if (!handle_valid(handle))
> -               return INTEL_GVT_INVALID_ADDR;
> +               return NULL;
>
>         info = (struct kvmgt_guest_info *)handle;
>
> -       pfn = kvm_pfn_page_unwrap(gfn_to_pfn(info->kvm, gfn));
> -       if (is_error_noslot_pfn(pfn))
> -               return INTEL_GVT_INVALID_ADDR;
> -
> -       return pfn;
> +       return gfn_to_pfn(info->kvm, gfn).page;
>  }
>
>  static int kvmgt_dma_map_guest_page(unsigned long handle, unsigned long gfn,
> @@ -2112,7 +2108,7 @@ static const struct intel_gvt_mpt kvmgt_mpt = {
>         .disable_page_track = kvmgt_page_track_remove,
>         .read_gpa = kvmgt_read_gpa,
>         .write_gpa = kvmgt_write_gpa,
> -       .gfn_to_mfn = kvmgt_gfn_to_pfn,
> +       .gfn_to_mfn_page = kvmgt_gfn_to_page,
>         .dma_map_guest_page = kvmgt_dma_map_guest_page,
>         .dma_unmap_guest_page = kvmgt_dma_unmap_guest_page,
>         .dma_pin_guest_page = kvmgt_dma_pin_guest_page,
> diff --git a/drivers/gpu/drm/i915/gvt/mpt.h b/drivers/gpu/drm/i915/gvt/mpt.h
> index 550a456e936f..9169b83cf0f6 100644
> --- a/drivers/gpu/drm/i915/gvt/mpt.h
> +++ b/drivers/gpu/drm/i915/gvt/mpt.h
> @@ -214,17 +214,17 @@ static inline int intel_gvt_hypervisor_write_gpa(struct intel_vgpu *vgpu,
>  }
>
>  /**
> - * intel_gvt_hypervisor_gfn_to_mfn - translate a GFN to MFN
> + * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
>   * @vgpu: a vGPU
>   * @gpfn: guest pfn
>   *
>   * Returns:
> - * MFN on success, INTEL_GVT_INVALID_ADDR if failed.
> + * struct page* on success, NULL if failed.
>   */
> -static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn(
> +static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
>                 struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -       return intel_gvt_host.mpt->gfn_to_mfn(vgpu->handle, gfn);
> +       return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
>  }
>
>  /**
> --
> 2.32.0.93.g670b81a890-goog
>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  6:57   ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  6:57   ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  6:57   ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, dri-devel, intel-gfx, Vitaly Kuznetsov,
	kvmarm, linux-arm-kernel, Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  6:57   ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  6:57   ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: David Stevens, intel-gvt-dev, Wanpeng Li, kvm, Will Deacon,
	Suzuki K Poulose, Sean Christopherson, Joerg Roedel,
	linuxppc-dev, linux-kernel, kvm-ppc, linux-mips, James Morse,
	dri-devel, intel-gfx, Vitaly Kuznetsov, Alexandru Elisei, kvmarm,
	linux-arm-kernel, Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24  6:57   ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  6:57 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel,
	David Stevens

On 24/06/21 05:57, David Stevens wrote:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Thank you very much for this.  I agree that it makes sense to have a 
minimal change; I had similar changes almost ready, but was stuck with 
deadlocks in the gfn_to_pfn_cache case.  In retrospect I should have 
posted something similar to your patches.

I have started reviewing the patches, and they look good.  I will try to 
include them in 5.13.

Paolo

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24  3:57   ` David Stevens
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  7:31     ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  7:31     ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, James Morse, dri-devel, intel-gfx,
	Vitaly Kuznetsov, Alexandru Elisei, kvmarm, linux-arm-kernel,
	Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  7:31     ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Sean Christopherson,
	Joerg Roedel, linuxppc-dev, linux-kernel, kvm-ppc, linux-mips,
	dri-devel, intel-gfx, Vitaly Kuznetsov, kvmarm, linux-arm-kernel,
	Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  7:31     ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  7:31     ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: intel-gvt-dev, Wanpeng Li, kvm, Will Deacon, Suzuki K Poulose,
	Sean Christopherson, Joerg Roedel, linuxppc-dev, linux-kernel,
	kvm-ppc, linux-mips, James Morse, dri-devel, intel-gfx,
	Vitaly Kuznetsov, Alexandru Elisei, kvmarm, linux-arm-kernel,
	Jim Mattson

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  7:31     ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  7:31 UTC (permalink / raw)
  To: David Stevens, Marc Zyngier, Huacai Chen, Aleksandar Markovic,
	Paul Mackerras, Zhenyu Wang, Zhi Wang
  Cc: James Morse, Alexandru Elisei, Suzuki K Poulose, Will Deacon,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-arm-kernel, kvmarm, linux-kernel, linux-mips,
	kvm, kvm-ppc, linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On 24/06/21 05:57, David Stevens wrote:
>   static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> -			int map_writable, int max_level, kvm_pfn_t pfn,
> +			int map_writable, int max_level,
> +			const struct kvm_pfn_page *pfnpg,
>   			bool prefault, bool is_tdp)
>   {
>   	bool nx_huge_page_workaround_enabled = is_nx_huge_pa

So the main difference with my tentative patches is that here I was 
passing the struct by value.  I'll try to clean this up for 5.15, but 
for now it's all good like this.  Thanks again!

Paolo

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
  2021-06-24  3:57   ` David Stevens
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  8:43     ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  8:43     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  8:43     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  8:43     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  8:43     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  8:43     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:43 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>

Changelog? This looks like a bug, should it have a Fixes: tag?

Thanks,
Nick

> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index cedc17b2f60e..97ff184084b4 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>  		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>  			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>  			     hpa, *sptep);
> +
> +	kvm_release_pfn_clean(pfn);
>  }
>  
>  static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
> -- 
> 2.32.0.93.g670b81a890-goog
> 
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  3:57   ` David Stevens
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  8:52     ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  8:52     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  8:52     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  8:52     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  8:52     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  8:52     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:52 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.

Hmm. You mean callers that do need the page will be updated? Normally 
if there will be leftover users that don't need the struct page then
you would go the other way and keep the old call the same, and add a new
one (gfn_to_pfn_page) just for those that need it.

Most kernel code I look at passes back multiple values by updating 
pointers to struct or variables rather than returning a struct, I 
suppose that's not really a big deal and a matter of taste.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24  3:57   ` David Stevens
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  8:58     ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  8:58     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  8:58     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  8:58     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  8:58     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24  8:58     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  8:58 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> From: David Stevens <stevensd@chromium.org>
>  out_unlock:
>  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(pfn);
> +	if (pfnpg.page)
> +		put_page(pfnpg.page);
>  	return r;
>  }

How about

  kvm_release_pfn_page_clean(pfnpg);

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  3:57   ` David Stevens
                       ` (4 preceding siblings ...)
  (?)
@ 2021-06-24  9:40     ` Marc Zyngier
  -1 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Will Deacon, kvmarm,
	intel-gvt-dev, Joerg Roedel, Huacai Chen, Aleksandar Markovic,
	Zhi Wang, Suzuki K Poulose, intel-gfx, kvm-ppc, Zhenyu Wang,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc, Zhenyu Wang,
	linux-arm-kernel, Jim Mattson, Sean Christopherson, linux-kernel,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, Suzuki K Poulose, intel-gfx,
	kvm-ppc, Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Suzuki K Poulose, intel-gfx, kvm-ppc,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:40     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24  9:40 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

Hi David,

On Thu, 24 Jun 2021 04:57:45 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Return a struct kvm_pfn_page containing both a pfn and an optional
> struct page from the gfn_to_pfn family of functions. This differentiates
> the gup and follow_fault_pfn cases, which allows callers that only need
> a pfn to avoid touching the page struct in the latter case. For callers
> that need a struct page, introduce a helper function that unwraps a
> struct kvm_pfn_page into a struct page. This helper makes the call to
> kvm_get_pfn which had previously been in hva_to_pfn_remapped.
> 
> For now, wrap all calls to gfn_to_pfn functions in the new helper
> function. Callers which don't need the page struct will be updated in
> follow-up patches.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c                   |   5 +-
>  arch/mips/kvm/mmu.c                    |   3 +-
>  arch/powerpc/kvm/book3s.c              |   3 +-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c    |   5 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |   4 +-
>  arch/powerpc/kvm/e500_mmu_host.c       |   2 +-
>  arch/x86/kvm/mmu/mmu.c                 |  11 ++-
>  arch/x86/kvm/mmu/mmu_audit.c           |   2 +-
>  arch/x86/kvm/x86.c                     |   2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c       |   2 +-
>  include/linux/kvm_host.h               |  27 ++++--
>  include/linux/kvm_types.h              |   5 +
>  virt/kvm/kvm_main.c                    | 121 +++++++++++++------------
>  14 files changed, 109 insertions(+), 88 deletions(-)
> 

[...]

> +kvm_pfn_t kvm_pfn_page_unwrap(struct kvm_pfn_page pfnpg)
> +{
> +	if (pfnpg.page)
> +		return pfnpg.pfn;
> +
> +	kvm_get_pfn(pfnpg.pfn);
> +	return pfnpg.pfn;
> +}
> +EXPORT_SYMBOL_GPL(kvm_pfn_page_unwrap);

I'd really like to see a tiny bit of documentation explaining that
calls to kvm_pfn_page_unwrap() are not idempotent.

Otherwise, looks good to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  8:52     ` Nicholas Piggin
                         ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  9:42       ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:42       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:42       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:42       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:42       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:42       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:42 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:52, Nicholas Piggin wrote:
>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>> function. Callers which don't need the page struct will be updated in
>> follow-up patches.
> Hmm. You mean callers that do need the page will be updated? Normally
> if there will be leftover users that don't need the struct page then
> you would go the other way and keep the old call the same, and add a new
> one (gfn_to_pfn_page) just for those that need it.

Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
it's a good idea to move the short name to the common case and the ugly 
kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
sure there should be any kvm_pfn_page_unwrap in the end.

Paolo

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
  2021-06-24  8:43     ` Nicholas Piggin
                         ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  9:43       ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  9:43       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  9:43       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  9:43       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  9:43       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24  9:43       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24  9:43 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 10:43, Nicholas Piggin wrote:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> From: David Stevens <stevensd@chromium.org>
> 
> Changelog? This looks like a bug, should it have a Fixes: tag?

Probably has been there forever... The best way to fix the bug would be 
to nuke mmu_audit.c, which I've threatened to do many times but never 
followed up on.

Paolo

> Thanks,
> Nick
> 
>>
>> Signed-off-by: David Stevens <stevensd@chromium.org>
>> ---
>>   arch/x86/kvm/mmu/mmu_audit.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
>> index cedc17b2f60e..97ff184084b4 100644
>> --- a/arch/x86/kvm/mmu/mmu_audit.c
>> +++ b/arch/x86/kvm/mmu/mmu_audit.c
>> @@ -121,6 +121,8 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
>>   		audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
>>   			     "ent %llxn", vcpu->arch.mmu->root_level, pfn,
>>   			     hpa, *sptep);
>> +
>> +	kvm_release_pfn_clean(pfn);
>>   }
>>   
>>   static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>> -- 
>> 2.32.0.93.g670b81a890-goog
>>
>>
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  9:42       ` Paolo Bonzini
                           ` (3 preceding siblings ...)
  (?)
@ 2021-06-24  9:57         ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24  9:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24  9:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
> On 24/06/21 10:52, Nicholas Piggin wrote:
>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>> function. Callers which don't need the page struct will be updated in
>>> follow-up patches.
>> Hmm. You mean callers that do need the page will be updated? Normally
>> if there will be leftover users that don't need the struct page then
>> you would go the other way and keep the old call the same, and add a new
>> one (gfn_to_pfn_page) just for those that need it.
> 
> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
> it's a good idea to move the short name to the common case and the ugly 
> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
> sure there should be any kvm_pfn_page_unwrap in the end.

If all callers were updated that is one thing, but from the changelog
it sounds like that would not happen and there would be some gfn_to_pfn
users left over.

But yes in the end you would either need to make gfn_to_pfn never return
a page found via follow_pte, or change all callers to the new way. If 
the plan is for the latter then I guess that's fine.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24  8:58     ` Nicholas Piggin
                         ` (4 preceding siblings ...)
  (?)
@ 2021-06-24 10:06       ` Marc Zyngier
  -1 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, Paolo Bonzini,
	David Stevens, Zhenyu Wang, Zhi Wang, Alexandru Elisei,
	dri-devel, intel-gfx, intel-gvt-dev, James Morse, Jim Mattson,
	Joerg Roedel, kvmarm, kvm-ppc, kvm, linux-arm-kernel,
	linux-kernel, linux-mips, linuxppc-dev, Sean Christopherson,
	Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Will Deacon, kvmarm,
	Alexandru Elisei, Joerg Roedel, Huacai Chen, Aleksandar Markovic,
	Zhi Wang, Suzuki K Poulose, intel-gfx, kvm-ppc, Zhenyu Wang,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc, Zhenyu Wang,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, David Stevens, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, Paolo Bonzini,
	David Stevens, Zhenyu Wang, Zhi Wang, Alexandru Elisei,
	dri-devel, intel-gfx, intel-gvt-dev, James Morse, Jim Mattson,
	Joerg Roedel, kvmarm, kvm-ppc, kvm, linux-arm-kernel,
	linux-kernel, linux-mips, linuxppc-dev, Sean Christopherson,
	Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Alexandru Elisei, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, Suzuki K Poulose, intel-gfx,
	kvm-ppc, intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Alexandru Elisei, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Suzuki K Poulose, intel-gfx, kvm-ppc,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:06       ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, Paolo Bonzini,
	David Stevens, Zhenyu Wang, Zhi Wang, Alexandru Elisei,
	dri-devel, intel-gfx, intel-gvt-dev, James Morse, Jim Mattson,
	Joerg Roedel, kvmarm, kvm-ppc, kvm, linux-arm-kernel,
	linux-kernel, linux-mips, linuxppc-dev, Sean Christopherson,
	Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On Thu, 24 Jun 2021 09:58:00 +0100,
Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > From: David Stevens <stevensd@chromium.org>
> >  out_unlock:
> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(pfn);
> > +	if (pfnpg.page)
> > +		put_page(pfnpg.page);
> >  	return r;
> >  }
> 
> How about
> 
>   kvm_release_pfn_page_clean(pfnpg);

I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
doesn't mark the page 'clean'. I find put_page() more correct.

Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
bad at naming things that I could just as well call it 'bob()'.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  9:57         ` Nicholas Piggin
                             ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 10:13           ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:13           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:13           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:13           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:13           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:13           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:13 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 11:57, Nicholas Piggin wrote:
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so
>> it's a good idea to move the short name to the common case and the ugly
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not
>> sure there should be any kvm_pfn_page_unwrap in the end.
>
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.

In this patches there are, so yeah the plan is to always change the 
callers to the new way.

> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If
> the plan is for the latter then I guess that's fine.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24 10:06       ` Marc Zyngier
                           ` (4 preceding siblings ...)
  (?)
@ 2021-06-24 10:17         ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, David Stevens,
	Zhenyu Wang, Zhi Wang, Alexandru Elisei, dri-devel, intel-gfx,
	intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel, kvmarm,
	kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Sean Christopherson, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Will Deacon, kvmarm,
	Alexandru Elisei, Joerg Roedel, Huacai Chen, Aleksandar Markovic,
	Zhi Wang, Suzuki K Poulose, intel-gfx, kvm-ppc, Zhenyu Wang,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Vitaly Kuznetsov, linuxppc-dev

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc, Zhenyu Wang,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, David Stevens,
	Vitaly Kuznetsov, linuxppc-dev

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, David Stevens,
	Zhenyu Wang, Zhi Wang, Alexandru Elisei, dri-devel, intel-gfx,
	intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel, kvmarm,
	kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Sean Christopherson, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Alexandru Elisei, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, Suzuki K Poulose, intel-gfx,
	kvm-ppc, intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Vitaly Kuznetsov, linuxppc-dev

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Alexandru Elisei, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Suzuki K Poulose, intel-gfx, kvm-ppc,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Vitaly Kuznetsov, linuxppc-dev

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:17         ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Marc Zyngier, Nicholas Piggin
  Cc: Aleksandar Markovic, Huacai Chen, Paul Mackerras, David Stevens,
	Zhenyu Wang, Zhi Wang, Alexandru Elisei, dri-devel, intel-gfx,
	intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel, kvmarm,
	kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Sean Christopherson, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 12:06, Marc Zyngier wrote:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> From: David Stevens <stevensd@chromium.org>
>>>   out_unlock:
>>>   	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>>>   		read_unlock(&vcpu->kvm->mmu_lock);
>>>   	else
>>>   		write_unlock(&vcpu->kvm->mmu_lock);
>>> -	kvm_release_pfn_clean(pfn);
>>> +	if (pfnpg.page)
>>> +		put_page(pfnpg.page);
>>>   	return r;
>>>   }
>>
>> How about
>>
>>    kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

The best way to go would be to get rid of kvm_release_pfn_clean() and 
always go through a pfn_page.  Then we could or could not introduce 
wrappers kvm_put_pfn_page{,_dirty}.

I think for now it's best to limit the churn since these patches will go 
in the stable releases too, and clean up the resulting API once we have 
a clear idea of how all architectures are using kvm_pfn_page.

Paolo

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24  9:57         ` Nicholas Piggin
                             ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 10:17           ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:17           ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:17           ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:17           ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:17           ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:17           ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:17 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 7:57 pm:
> Excerpts from Paolo Bonzini's message of June 24, 2021 7:42 pm:
>> On 24/06/21 10:52, Nicholas Piggin wrote:
>>>> For now, wrap all calls to gfn_to_pfn functions in the new helper
>>>> function. Callers which don't need the page struct will be updated in
>>>> follow-up patches.
>>> Hmm. You mean callers that do need the page will be updated? Normally
>>> if there will be leftover users that don't need the struct page then
>>> you would go the other way and keep the old call the same, and add a new
>>> one (gfn_to_pfn_page) just for those that need it.
>> 
>> Needing kvm_pfn_page_unwrap is a sign that something might be buggy, so 
>> it's a good idea to move the short name to the common case and the ugly 
>> kvm_pfn_page_unwrap(gfn_to_pfn(...)) for the weird one.  In fact I'm not 
>> sure there should be any kvm_pfn_page_unwrap in the end.
> 
> If all callers were updated that is one thing, but from the changelog
> it sounds like that would not happen and there would be some gfn_to_pfn
> users left over.
> 
> But yes in the end you would either need to make gfn_to_pfn never return
> a page found via follow_pte, or change all callers to the new way. If 
> the plan is for the latter then I guess that's fine.

Actually in that case anyway I don't see the need -- the existence of
gfn_to_pfn is enough to know it might be buggy. It can just as easily
be grepped for as kvm_pfn_page_unwrap. And are gfn_to_page cases also
vulernable to the same issue?

So I think it could be marked deprecated or something if not everything 
will be converted in the one series, and don't need to touch all that 
arch code with this patch.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24 10:17           ` Nicholas Piggin
                               ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 10:21             ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:21             ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:21             ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:21             ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:21             ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:21             ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 10:21 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

On 24/06/21 12:17, Nicholas Piggin wrote:
>> If all callers were updated that is one thing, but from the changelog
>> it sounds like that would not happen and there would be some gfn_to_pfn
>> users left over.
>>
>> But yes in the end you would either need to make gfn_to_pfn never return
>> a page found via follow_pte, or change all callers to the new way. If
>> the plan is for the latter then I guess that's fine.
>
> Actually in that case anyway I don't see the need -- the existence of
> gfn_to_pfn is enough to know it might be buggy. It can just as easily
> be grepped for as kvm_pfn_page_unwrap.

Sure, but that would leave us with longer function names 
(gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
that looks worse and the unsafe use is the one that looks safe.

> And are gfn_to_page cases also
> vulernable to the same issue?

No, they're just broken for the VM_IO|VM_PFNMAP case.

Paolo

> So I think it could be marked deprecated or something if not everything
> will be converted in the one series, and don't need to touch all that
> arch code with this patch.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
  2021-06-24  3:57   ` David Stevens
                     ` (5 preceding siblings ...)
  (?)
@ 2021-06-24 10:25   ` kernel test robot
  -1 siblings, 0 replies; 213+ messages in thread
From: kernel test robot @ 2021-06-24 10:25 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

Hi David,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v5.13-rc7]
[cannot apply to kvm/queue drm-intel/for-linux-next kvm-ppc/kvm-ppc-next next-20210623]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Stevens/KVM-Remove-uses-of-struct-page-from-x86-and-arm64-MMU/20210624-120152
base:    13311e74253fe64329390df80bed3f07314ddd61
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/834ba0a17bd373a384ac2361165f6a27499a61df
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Stevens/KVM-Remove-uses-of-struct-page-from-x86-and-arm64-MMU/20210624-120152
        git checkout 834ba0a17bd373a384ac2361165f6a27499a61df
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gvt/gvt.h:734,
                    from <command-line>:
   drivers/gpu/drm/i915/gvt/mpt.h: In function 'intel_gvt_hypervisor_gfn_to_mfn_page':
>> drivers/gpu/drm/i915/gvt/mpt.h:227:9: error: returning 'struct page *' from a function with return type 'long unsigned int' makes integer from pointer without a cast [-Werror=int-conversion]
     227 |  return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors


vim +227 drivers/gpu/drm/i915/gvt/mpt.h

   215	
   216	/**
   217	 * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
   218	 * @vgpu: a vGPU
   219	 * @gpfn: guest pfn
   220	 *
   221	 * Returns:
   222	 * struct page* on success, NULL if failed.
   223	 */
   224	static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
   225			struct intel_vgpu *vgpu, unsigned long gfn)
   226	{
 > 227		return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
   228	}
   229	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 64724 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24  3:57 ` David Stevens
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 10:34   ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 10:34   ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 10:34   ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 10:34   ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 10:34   ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 10:34   ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
> assoicated struct pages, so they should not be passed to pfn_to_page.
> This series removes such calls from the x86 and arm64 secondary MMU. To
> do this, this series modifies gfn_to_pfn to return a struct page in
> addition to a pfn, if the hva was resolved by gup. This allows the
> caller to call put_page only when necessated by gup.
> 
> This series provides a helper function that unwraps the new return type
> of gfn_to_pfn to provide behavior identical to the old behavior. As I
> have no hardware to test powerpc/mips changes, the function is used
> there for minimally invasive changes. Additionally, as gfn_to_page and
> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
> easily changed over to only use pfns.
> 
> This addresses CVE-2021-22543 on x86 and arm64.

Does this fix the problem? (untested I don't have a POC setup at hand,
but at least in concept)

I have no problem with improving the API and probably in the direction 
of your series is good. But there seems to be a lot of unfixed arch code 
and broken APIs remaining left to do after your series too. This might 
be most suitable to backport and as a base for your series that can take
more time to convert to new APIs.

Thanks,
Nick

---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..e208c279d903 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2104,13 +2104,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
@@ -2487,6 +2495,13 @@ void kvm_set_pfn_accessed(kvm_pfn_t pfn)
 }
 EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 void kvm_get_pfn(kvm_pfn_t pfn)
 {
 	if (!kvm_is_reserved_pfn(pfn))

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn
  2021-06-24  3:57   ` David Stevens
                     ` (6 preceding siblings ...)
  (?)
@ 2021-06-24 10:38   ` kernel test robot
  -1 siblings, 0 replies; 213+ messages in thread
From: kernel test robot @ 2021-06-24 10:38 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3410 bytes --]

Hi David,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v5.13-rc7]
[cannot apply to kvm/queue drm-intel/for-linux-next kvm-ppc/kvm-ppc-next next-20210623]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Stevens/KVM-Remove-uses-of-struct-page-from-x86-and-arm64-MMU/20210624-120152
base:    13311e74253fe64329390df80bed3f07314ddd61
config: x86_64-randconfig-r025-20210622 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/834ba0a17bd373a384ac2361165f6a27499a61df
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Stevens/KVM-Remove-uses-of-struct-page-from-x86-and-arm64-MMU/20210624-120152
        git checkout 834ba0a17bd373a384ac2361165f6a27499a61df
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/gvt/gvt.h:734,
                    from drivers/gpu/drm/i915/intel_gvt.c:27:
   drivers/gpu/drm/i915/gvt/mpt.h: In function 'intel_gvt_hypervisor_gfn_to_mfn_page':
>> drivers/gpu/drm/i915/gvt/mpt.h:227:9: warning: returning 'struct page *' from a function with return type 'long unsigned int' makes integer from pointer without a cast [-Wint-conversion]
     227 |  return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from drivers/gpu/drm/i915/gvt/gvt.h:734,
                    from drivers/gpu/drm/i915/gvt/gtt.c:37:
   drivers/gpu/drm/i915/gvt/mpt.h: In function 'intel_gvt_hypervisor_gfn_to_mfn_page':
>> drivers/gpu/drm/i915/gvt/mpt.h:227:9: warning: returning 'struct page *' from a function with return type 'long unsigned int' makes integer from pointer without a cast [-Wint-conversion]
     227 |  return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/gpu/drm/i915/gvt/gtt.c: In function 'is_2MB_gtt_possible':
>> drivers/gpu/drm/i915/gvt/gtt.c:1169:7: warning: assignment to 'struct page *' from 'long unsigned int' makes pointer from integer without a cast [-Wint-conversion]
    1169 |  page = intel_gvt_hypervisor_gfn_to_mfn_page(vgpu, ops->get_pfn(entry));
         |       ^


vim +227 drivers/gpu/drm/i915/gvt/mpt.h

   215	
   216	/**
   217	 * intel_gvt_hypervisor_gfn_to_mfn_page - translate a GFN to MFN page
   218	 * @vgpu: a vGPU
   219	 * @gpfn: guest pfn
   220	 *
   221	 * Returns:
   222	 * struct page* on success, NULL if failed.
   223	 */
   224	static inline unsigned long intel_gvt_hypervisor_gfn_to_mfn_page(
   225			struct intel_vgpu *vgpu, unsigned long gfn)
   226	{
 > 227		return intel_gvt_host.mpt->gfn_to_mfn_page(vgpu->handle, gfn);
   228	}
   229	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 49484 bytes --]

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
  2021-06-24 10:21             ` Paolo Bonzini
                                 ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 10:42               ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:42               ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:42               ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, intel-gfx, linuxppc-dev, linux-kernel,
	dri-devel, kvmarm, Will Deacon, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:42               ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:42               ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, Suzuki K Poulose, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	James Morse, kvm-ppc, Sean Christopherson, Vitaly Kuznetsov,
	linux-mips, intel-gvt-dev, Joerg Roedel, linux-arm-kernel,
	Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn
@ 2021-06-24 10:42               ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, Suzuki K Poulose, Vitaly Kuznetsov,
	Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 8:21 pm:
> On 24/06/21 12:17, Nicholas Piggin wrote:
>>> If all callers were updated that is one thing, but from the changelog
>>> it sounds like that would not happen and there would be some gfn_to_pfn
>>> users left over.
>>>
>>> But yes in the end you would either need to make gfn_to_pfn never return
>>> a page found via follow_pte, or change all callers to the new way. If
>>> the plan is for the latter then I guess that's fine.
>>
>> Actually in that case anyway I don't see the need -- the existence of
>> gfn_to_pfn is enough to know it might be buggy. It can just as easily
>> be grepped for as kvm_pfn_page_unwrap.
> 
> Sure, but that would leave us with longer function names 
> (gfn_to_pfn_page* instead of gfn_to_pfn*).  So the "safe" use is the one 
> that looks worse and the unsafe use is the one that looks safe.

The churn isn't justified because of function name length. Choose g2pp() 
if you want a non-descriptive but short name.

The existing name isn't good anyway because it not only looks up a pfn 
but also a page, and more importantly it gets a ref on the page. The
name should be changed if you introduce a new API.

>> And are gfn_to_page cases also
>> vulernable to the same issue?
> 
> No, they're just broken for the VM_IO|VM_PFNMAP case.

No they aren't vulnerable, or they are vunlerable but also broken in 
other cases?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
  2021-06-24  3:57   ` David Stevens
                       ` (4 preceding siblings ...)
  (?)
@ 2021-06-24 10:43     ` Marc Zyngier
  -1 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Will Deacon, kvmarm,
	intel-gvt-dev, Joerg Roedel, Huacai Chen, Aleksandar Markovic,
	Zhi Wang, Suzuki K Poulose, intel-gfx, kvm-ppc, Zhenyu Wang,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc, Zhenyu Wang,
	linux-arm-kernel, Jim Mattson, Sean Christopherson, linux-kernel,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, Suzuki K Poulose, intel-gfx,
	kvm-ppc, Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Suzuki K Poulose, intel-gfx, kvm-ppc,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, Paolo Bonzini,
	Vitaly Kuznetsov, linuxppc-dev

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize == PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 4/6] KVM: arm64/mmu: avoid struct page in MMU
@ 2021-06-24 10:43     ` Marc Zyngier
  0 siblings, 0 replies; 213+ messages in thread
From: Marc Zyngier @ 2021-06-24 10:43 UTC (permalink / raw)
  To: David Stevens
  Cc: Huacai Chen, Aleksandar Markovic, Paul Mackerras, Paolo Bonzini,
	Zhenyu Wang, Zhi Wang, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-arm-kernel, kvmarm, linux-kernel, linux-mips, kvm, kvm-ppc,
	linuxppc-dev, intel-gvt-dev, intel-gfx, dri-devel

On Thu, 24 Jun 2021 04:57:47 +0100,
David Stevens <stevensd@chromium.org> wrote:
> 
> From: David Stevens <stevensd@chromium.org>
> 
> Avoid converting pfns returned by follow_fault_pfn to struct pages to
> transiently take a reference. The reference was originally taken to
> match the reference taken by gup. However, pfns returned by
> follow_fault_pfn may not have a struct page set up for reference
> counting.
> 
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  arch/arm64/kvm/mmu.c | 43 +++++++++++++++++++++++--------------------
>  1 file changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 896b3644b36f..a741972cb75f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c

[...]

> @@ -968,16 +968,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 */
>  	if (vma_pagesize = PAGE_SIZE && !force_pte)
>  		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
> -							   &pfn, &fault_ipa);
> +							   &pfnpg, &fault_ipa);
>  	if (writable)
>  		prot |= KVM_PGTABLE_PROT_W;
>  
>  	if (fault_status != FSC_PERM && !device)
> -		clean_dcache_guest_page(pfn, vma_pagesize);
> +		clean_dcache_guest_page(pfnpg.pfn, vma_pagesize);
>  
>  	if (exec_fault) {
>  		prot |= KVM_PGTABLE_PROT_X;
> -		invalidate_icache_guest_page(pfn, vma_pagesize);
> +		invalidate_icache_guest_page(pfnpg.pfn, vma_pagesize);

This is going to clash with what is currently in -next, specially with
MTE.

Paolo, if you really want to take this in 5.13, can you please push a
stable branch based on -rc4 or older so that I can merge it in and
test it?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
  2021-06-24 10:06       ` Marc Zyngier
                           ` (4 preceding siblings ...)
  (?)
@ 2021-06-24 10:43         ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Aleksandar Markovic, Alexandru Elisei, Huacai Chen, dri-devel,
	intel-gfx, intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel,
	kvmarm, kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Paul Mackerras, Paolo Bonzini, Sean Christopherson,
	David Stevens, Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, Zhenyu Wang, Zhi Wang

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Will Deacon, kvmarm,
	intel-gvt-dev, Joerg Roedel, Huacai Chen, Aleksandar Markovic,
	Zhi Wang, Suzuki K Poulose, intel-gfx, kvm-ppc, Zhenyu Wang,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc, Zhenyu Wang,
	linux-arm-kernel, Jim Mattson, Sean Christopherson, linux-kernel,
	David Stevens, Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Aleksandar Markovic, Alexandru Elisei, Huacai Chen, dri-devel,
	intel-gfx, intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel,
	kvmarm, kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Paul Mackerras, Paolo Bonzini, Sean Christopherson,
	David Stevens, Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, Zhenyu Wang, Zhi Wang

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Zhi Wang, Suzuki K Poulose, intel-gfx,
	kvm-ppc, Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, intel-gvt-dev, Joerg Roedel, Huacai Chen,
	Aleksandar Markovic, Suzuki K Poulose, intel-gfx, kvm-ppc,
	Alexandru Elisei, linux-arm-kernel, Jim Mattson,
	Sean Christopherson, linux-kernel, James Morse, David Stevens,
	Paolo Bonzini, Vitaly Kuznetsov, linuxppc-dev

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU
@ 2021-06-24 10:43         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 10:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Aleksandar Markovic, Alexandru Elisei, Huacai Chen, dri-devel,
	intel-gfx, intel-gvt-dev, James Morse, Jim Mattson, Joerg Roedel,
	kvmarm, kvm-ppc, kvm, linux-arm-kernel, linux-kernel, linux-mips,
	linuxppc-dev, Paul Mackerras, Paolo Bonzini, Sean Christopherson,
	David Stevens, Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li,
	Will Deacon, Zhenyu Wang, Zhi Wang

Excerpts from Marc Zyngier's message of June 24, 2021 8:06 pm:
> On Thu, 24 Jun 2021 09:58:00 +0100,
> Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> > From: David Stevens <stevensd@chromium.org>
>> >  out_unlock:
>> >  	if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
>> >  		read_unlock(&vcpu->kvm->mmu_lock);
>> >  	else
>> >  		write_unlock(&vcpu->kvm->mmu_lock);
>> > -	kvm_release_pfn_clean(pfn);
>> > +	if (pfnpg.page)
>> > +		put_page(pfnpg.page);
>> >  	return r;
>> >  }
>> 
>> How about
>> 
>>   kvm_release_pfn_page_clean(pfnpg);
> 
> I'm not sure. I always found kvm_release_pfn_clean() ugly, because it
> doesn't mark the page 'clean'. I find put_page() more correct.
> 
> Something like 'kvm_put_pfn_page()' would make more sense, but I'm so
> bad at naming things that I could just as well call it 'bob()'.

That seems like a fine name to me. A little better than bob.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 10:34   ` Nicholas Piggin
                       ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 11:42     ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 11:42     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 11:42     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 11:42     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 11:42     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 11:42     ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 11:42 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>> assoicated struct pages, so they should not be passed to pfn_to_page.
>> This series removes such calls from the x86 and arm64 secondary MMU. To
>> do this, this series modifies gfn_to_pfn to return a struct page in
>> addition to a pfn, if the hva was resolved by gup. This allows the
>> caller to call put_page only when necessated by gup.
>> 
>> This series provides a helper function that unwraps the new return type
>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>> have no hardware to test powerpc/mips changes, the function is used
>> there for minimally invasive changes. Additionally, as gfn_to_page and
>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>> easily changed over to only use pfns.
>> 
>> This addresses CVE-2021-22543 on x86 and arm64.
> 
> Does this fix the problem? (untested I don't have a POC setup at hand,
> but at least in concept)

This one actually compiles at least. Unfortunately I don't have much 
time in the near future to test, and I only just found out about this
CVE a few hours ago.

---


It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 11:42     ` Nicholas Piggin
                         ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 12:00       ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:00       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:00       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:00       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:00       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:00       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:00 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;

So !pfn_valid would always return true.  Yeah, this should work and is 
certainly appealing!

Paolo


> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 11:42     ` Nicholas Piggin
                         ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 12:41       ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:41       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:41       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:41       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:41       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:41       ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 12:41 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 13:42, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>> caller to call put_page only when necessated by gup.
>>>
>>> This series provides a helper function that unwraps the new return type
>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>> have no hardware to test powerpc/mips changes, the function is used
>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>> easily changed over to only use pfns.
>>>
>>> This addresses CVE-2021-22543 on x86 and arm64.
>>
>> Does this fix the problem? (untested I don't have a POC setup at hand,
>> but at least in concept)
> 
> This one actually compiles at least. Unfortunately I don't have much
> time in the near future to test, and I only just found out about this
> CVE a few hours ago.

And it also works (the reproducer gets an infinite stream of userspace 
exits and especially does not crash).  We can still go for David's 
solution later since MMU notifiers are able to deal with this pages, but 
it's a very nice patch for stable kernels.

If you provide a Signed-off-by, I can integrate it.

Paolo

> ---
> 
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   
>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 12:41       ` Paolo Bonzini
                           ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 12:57         ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 12:57         ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-24 12:57 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
> On 24/06/21 13:42, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>> caller to call put_page only when necessated by gup.
>>>>
>>>> This series provides a helper function that unwraps the new return type
>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>> have no hardware to test powerpc/mips changes, the function is used
>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>> easily changed over to only use pfns.
>>>>
>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>
>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>> but at least in concept)
>> 
>> This one actually compiles at least. Unfortunately I don't have much
>> time in the near future to test, and I only just found out about this
>> CVE a few hours ago.
> 
> And it also works (the reproducer gets an infinite stream of userspace 
> exits and especially does not crash).  We can still go for David's 
> solution later since MMU notifiers are able to deal with this pages, but 
> it's a very nice patch for stable kernels.

Oh nice, thanks for testing. How's this?

Thanks,
Nick

---

KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It's possible to create a region which maps valid but non-refcounted
pages (e.g., tail pages of non-compound higher order allocations). These
host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
of APIs, which take a reference to the page, which takes it from 0 to 1.
When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on the page if it was non-zero,
which indicates it is participating in normal refcounting (and can be
released with put_page).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 virt/kvm/kvm_main.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a6bc7af0e28..46fb042837d2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
 	return true;
 }
 
+static int kvm_try_get_pfn(kvm_pfn_t pfn)
+{
+	if (kvm_is_reserved_pfn(pfn))
+		return 1;
+	return get_page_unless_zero(pfn_to_page(pfn));
+}
+
 static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 			       unsigned long addr, bool *async,
 			       bool write_fault, bool *writable,
@@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 	 * Whoever called remap_pfn_range is also going to call e.g.
 	 * unmap_mapping_range before the underlying pages are freed,
 	 * causing a call to our MMU notifier.
+	 *
+	 * Certain IO or PFNMAP mappings can be backed with valid
+	 * struct pages, but be allocated without refcounting e.g.,
+	 * tail pages of non-compound higher order allocations, which
+	 * would then underflow the refcount when the caller does the
+	 * required put_page. Don't allow those pages here.
 	 */ 
-	kvm_get_pfn(pfn);
+	if (!kvm_try_get_pfn(pfn))
+		r = -EFAULT;
 
 out:
 	pte_unmap_unlock(ptep, ptl);
 	*p_pfn = pfn;
-	return 0;
+
+	return r;
 }
 
 /*
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 12:57         ` Nicholas Piggin
                             ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 15:35           ` Paolo Bonzini
  -1 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>


^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-24 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 213+ messages in thread
From: Paolo Bonzini @ 2021-06-24 15:35 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On 24/06/21 14:57, Nicholas Piggin wrote:
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages

It doesn't really fix the underflow, it disallows mapping them in the 
first place.  Since in principle things can break, I'd rather be 
explicit, so let's go with "KVM: do not allow mapping valid but 
non-reference-counted pages".

> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,

s/on the page/on valid pages/ (makes clear that invalid pages are fine 
without refcounting).

Thank you *so* much, I'm awful at Linux mm.

Paolo

> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
  2021-06-24  9:43       ` Paolo Bonzini
                           ` (3 preceding siblings ...)
  (?)
@ 2021-06-24 15:36         ` Sean Christopherson
  -1 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang,
	Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24 15:36         ` Sean Christopherson
  0 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Zhenyu Wang, Will Deacon,
	kvmarm, Marc Zyngier, Joerg Roedel, Huacai Chen,
	Alexandru Elisei, Aleksandar Markovic, Zhi Wang,
	Suzuki K Poulose, intel-gfx, kvm-ppc, Nicholas Piggin,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson, linux-kernel,
	James Morse, David Stevens, Vitaly Kuznetsov, linuxppc-dev

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24 15:36         ` Sean Christopherson
  0 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Zhenyu Wang, Will Deacon, kvmarm, Marc Zyngier, Joerg Roedel,
	Huacai Chen, Aleksandar Markovic, Zhi Wang, intel-gfx, kvm-ppc,
	Nicholas Piggin, intel-gvt-dev, linux-arm-kernel, Jim Mattson,
	linux-kernel, David Stevens, Vitaly Kuznetsov, linuxppc-dev

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24 15:36         ` Sean Christopherson
  0 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, David Stevens, Zhenyu Wang, Zhi Wang,
	Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Suzuki K Poulose, Vitaly Kuznetsov, Wanpeng Li, Will Deacon

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24 15:36         ` Sean Christopherson
  0 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Marc Zyngier, Joerg Roedel, Huacai Chen,
	Alexandru Elisei, Aleksandar Markovic, Zhi Wang,
	Suzuki K Poulose, intel-gfx, kvm-ppc, Nicholas Piggin,
	intel-gvt-dev, linux-arm-kernel, Jim Mattson, linux-kernel,
	James Morse, David Stevens, Vitaly Kuznetsov, linuxppc-dev

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] KVM: x86/mmu: release audited pfns
@ 2021-06-24 15:36         ` Sean Christopherson
  0 siblings, 0 replies; 213+ messages in thread
From: Sean Christopherson @ 2021-06-24 15:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, kvm, dri-devel, linux-mips, Paul Mackerras,
	Will Deacon, kvmarm, Marc Zyngier, Joerg Roedel, Huacai Chen,
	Alexandru Elisei, Aleksandar Markovic, Suzuki K Poulose,
	intel-gfx, kvm-ppc, Nicholas Piggin, intel-gvt-dev,
	linux-arm-kernel, Jim Mattson, linux-kernel, James Morse,
	David Stevens, Vitaly Kuznetsov, linuxppc-dev

On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> On 24/06/21 10:43, Nicholas Piggin wrote:
> > Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
> > > From: David Stevens <stevensd@chromium.org>
> > 
> > Changelog? This looks like a bug, should it have a Fixes: tag?
> 
> Probably has been there forever... The best way to fix the bug would be to
> nuke mmu_audit.c, which I've threatened to do many times but never followed
> up on.

Yar.  It has only survived because it hasn't required any maintenance.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for KVM: Remove uses of struct page from x86 and arm64 MMU (rev3)
  2021-06-24  3:57 ` David Stevens
                   ` (13 preceding siblings ...)
  (?)
@ 2021-06-24 16:07 ` Patchwork
  -1 siblings, 0 replies; 213+ messages in thread
From: Patchwork @ 2021-06-24 16:07 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: intel-gfx

== Series Details ==

Series: KVM: Remove uses of struct page from x86 and arm64 MMU (rev3)
URL   : https://patchwork.freedesktop.org/series/91836/
State : failure

== Summary ==

Applying: KVM: Remove uses of struct page from x86 and arm64 MMU
Applying: KVM: mmu: also return page from gfn_to_pfn
Using index info to reconstruct a base tree...
M	arch/powerpc/kvm/book3s_64_mmu_hv.c
M	arch/x86/kvm/mmu/mmu_audit.c
M	arch/x86/kvm/x86.c
M	drivers/gpu/drm/i915/gvt/kvmgt.c
M	virt/kvm/kvm_main.c
Falling back to patching base and 3-way merge...
Auto-merging virt/kvm/kvm_main.c
CONFLICT (content): Merge conflict in virt/kvm/kvm_main.c
Auto-merging drivers/gpu/drm/i915/gvt/kvmgt.c
Auto-merging arch/x86/kvm/x86.c
Auto-merging arch/x86/kvm/mmu/mmu_audit.c
Auto-merging arch/powerpc/kvm/book3s_64_mmu_hv.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 KVM: mmu: also return page from gfn_to_pfn
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 15:35           ` Paolo Bonzini
                               ` (3 preceding siblings ...)
  (?)
@ 2021-06-25  0:20             ` Nicholas Piggin
  -1 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  0:20             ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  0:20             ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  0:20             ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  0:20             ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  0:20             ` Nicholas Piggin
  0 siblings, 0 replies; 213+ messages in thread
From: Nicholas Piggin @ 2021-06-25  0:20 UTC (permalink / raw)
  To: Aleksandar Markovic, Huacai Chen, Marc Zyngier, Paul Mackerras,
	Paolo Bonzini, David Stevens, Zhenyu Wang, Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon

Excerpts from Paolo Bonzini's message of June 25, 2021 1:35 am:
> On 24/06/21 14:57, Nicholas Piggin wrote:
>> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It doesn't really fix the underflow, it disallows mapping them in the 
> first place.  Since in principle things can break, I'd rather be 
> explicit, so let's go with "KVM: do not allow mapping valid but 
> non-reference-counted pages".
> 
>> It's possible to create a region which maps valid but non-refcounted
>> pages (e.g., tail pages of non-compound higher order allocations). These
>> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
>> of APIs, which take a reference to the page, which takes it from 0 to 1.
>> When the reference is dropped, this will free the page incorrectly.
>> 
>> Fix this by only taking a reference on the page if it was non-zero,
> 
> s/on the page/on valid pages/ (makes clear that invalid pages are fine 
> without refcounting).

That seems okay, you can adjust the title or changelog as you like.

> Thank you *so* much, I'm awful at Linux mm.

Glad to help. Easy to see why you were taking this approach because the 
API really does need to be improved and even a pretty intwined with mm 
subsystem like KVM shouldn't _really_ be doing this kind of trick (and
it should go away when old API is removed).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
  2021-06-24 12:57         ` Nicholas Piggin
                             ` (3 preceding siblings ...)
  (?)
@ 2021-06-25  7:44           ` Christian Borntraeger
  -1 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  7:44           ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  7:44           ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, intel-gfx, linuxppc-dev,
	linux-kernel, dri-devel, kvmarm, Will Deacon, kvm-ppc,
	Sean Christopherson, Vitaly Kuznetsov, linux-mips, intel-gvt-dev,
	Joerg Roedel, linux-arm-kernel, Jim Mattson



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  7:44           ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [Intel-gfx] [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  7:44           ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Wanpeng Li, kvm, David Stevens, Alexandru Elisei, intel-gfx,
	linuxppc-dev, linux-kernel, dri-devel, kvmarm, Will Deacon,
	Suzuki K Poulose, James Morse, kvm-ppc, Sean Christopherson,
	Vitaly Kuznetsov, linux-mips, intel-gvt-dev, Joerg Roedel,
	linux-arm-kernel, Jim Mattson



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 213+ messages in thread

* Re: [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU
@ 2021-06-25  7:44           ` Christian Borntraeger
  0 siblings, 0 replies; 213+ messages in thread
From: Christian Borntraeger @ 2021-06-25  7:44 UTC (permalink / raw)
  To: Nicholas Piggin, Aleksandar Markovic, Huacai Chen, Marc Zyngier,
	Paul Mackerras, Paolo Bonzini, David Stevens, Zhenyu Wang,
	Zhi Wang
  Cc: Alexandru Elisei, dri-devel, intel-gfx, intel-gvt-dev,
	James Morse, Jim Mattson, Joerg Roedel, kvmarm, kvm-ppc, kvm,
	linux-arm-kernel, linux-kernel, linux-mips, linuxppc-dev,
	Sean Christopherson, David Stevens, Suzuki K Poulose,
	Vitaly Kuznetsov, Wanpeng Li, Will Deacon



On 24.06.21 14:57, Nicholas Piggin wrote:
> Excerpts from Paolo Bonzini's message of June 24, 2021 10:41 pm:
>> On 24/06/21 13:42, Nicholas Piggin wrote:
>>> Excerpts from Nicholas Piggin's message of June 24, 2021 8:34 pm:
>>>> Excerpts from David Stevens's message of June 24, 2021 1:57 pm:
>>>>> KVM supports mapping VM_IO and VM_PFNMAP memory into the guest by using
>>>>> follow_pte in gfn_to_pfn. However, the resolved pfns may not have
>>>>> assoicated struct pages, so they should not be passed to pfn_to_page.
>>>>> This series removes such calls from the x86 and arm64 secondary MMU. To
>>>>> do this, this series modifies gfn_to_pfn to return a struct page in
>>>>> addition to a pfn, if the hva was resolved by gup. This allows the
>>>>> caller to call put_page only when necessated by gup.
>>>>>
>>>>> This series provides a helper function that unwraps the new return type
>>>>> of gfn_to_pfn to provide behavior identical to the old behavior. As I
>>>>> have no hardware to test powerpc/mips changes, the function is used
>>>>> there for minimally invasive changes. Additionally, as gfn_to_page and
>>>>> gfn_to_pfn_cache are not integrated with mmu notifier, they cannot be
>>>>> easily changed over to only use pfns.
>>>>>
>>>>> This addresses CVE-2021-22543 on x86 and arm64.
>>>>
>>>> Does this fix the problem? (untested I don't have a POC setup at hand,
>>>> but at least in concept)
>>>
>>> This one actually compiles at least. Unfortunately I don't have much
>>> time in the near future to test, and I only just found out about this
>>> CVE a few hours ago.
>>
>> And it also works (the reproducer gets an infinite stream of userspace
>> exits and especially does not crash).  We can still go for David's
>> solution later since MMU notifiers are able to deal with this pages, but
>> it's a very nice patch for stable kernels.
> 
> Oh nice, thanks for testing. How's this?
> 
> Thanks,
> Nick
> 
> ---
> 
> KVM: Fix page ref underflow for regions with valid but non-refcounted pages
> 
> It's possible to create a region which maps valid but non-refcounted
> pages (e.g., tail pages of non-compound higher order allocations). These
> host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family
> of APIs, which take a reference to the page, which takes it from 0 to 1.
> When the reference is dropped, this will free the page incorrectly.
> 
> Fix this by only taking a reference on the page if it was non-zero,
> which indicates it is participating in normal refcounting (and can be
> released with put_page).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   virt/kvm/kvm_main.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6a6bc7af0e28..46fb042837d2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2055,6 +2055,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault)
>   	return true;
>   }
>   
> +static int kvm_try_get_pfn(kvm_pfn_t pfn)
> +{
> +	if (kvm_is_reserved_pfn(pfn))
> +		return 1;
> +	return get_page_unless_zero(pfn_to_page(pfn));
> +}
> +
>   static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   			       unsigned long addr, bool *async,
>   			       bool write_fault, bool *writable,
> @@ -2104,13 +2111,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   	 * Whoever called remap_pfn_range is also going to call e.g.
>   	 * unmap_mapping_range before the underlying pages are freed,
>   	 * causing a call to our MMU notifier.
> +	 *
> +	 * Certain IO or PFNMAP mappings can be backed with valid
> +	 * struct pages, but be allocated without refcounting e.g.,
> +	 * tail pages of non-compound higher order allocations, which
> +	 * would then underflow the refcount when the caller does the
> +	 * required put_page. Don't allow those pages here.
>   	 */
> -	kvm_get_pfn(pfn);
> +	if (!kvm_try_get_pfn(pfn))
> +		r = -EFAULT;
>   

Right. That should also take care of s390 (pin_guest_page in vsie.c
which calls gfn_to_page).
FWIW, the current API is really hard to follow as it does not tell
which functions take a reference and which dont.

Anyway, this patch (with cc stable?)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>

>   out:
>   	pte_unmap_unlock(ptep, ptl);
>   	*p_pfn = pfn;
> -	return 0;
> +
> +	return r;
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 213+ messages in thread

end of thread, other threads:[~2021-06-28  9:32 UTC | newest]

Thread overview: 213+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24  3:57 [PATCH 0/6] KVM: Remove uses of struct page from x86 and arm64 MMU David Stevens
2021-06-24  3:57 ` David Stevens
2021-06-24  3:57 ` [Intel-gfx] " David Stevens
2021-06-24  3:57 ` David Stevens
2021-06-24  3:57 ` David Stevens
2021-06-24  3:57 ` David Stevens
2021-06-24  3:57 ` [PATCH 1/6] KVM: x86/mmu: release audited pfns David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  8:43   ` Nicholas Piggin
2021-06-24  8:43     ` Nicholas Piggin
2021-06-24  8:43     ` [Intel-gfx] " Nicholas Piggin
2021-06-24  8:43     ` Nicholas Piggin
2021-06-24  8:43     ` Nicholas Piggin
2021-06-24  8:43     ` Nicholas Piggin
2021-06-24  9:43     ` Paolo Bonzini
2021-06-24  9:43       ` Paolo Bonzini
2021-06-24  9:43       ` [Intel-gfx] " Paolo Bonzini
2021-06-24  9:43       ` Paolo Bonzini
2021-06-24  9:43       ` Paolo Bonzini
2021-06-24  9:43       ` Paolo Bonzini
2021-06-24 15:36       ` Sean Christopherson
2021-06-24 15:36         ` [Intel-gfx] " Sean Christopherson
2021-06-24 15:36         ` Sean Christopherson
2021-06-24 15:36         ` Sean Christopherson
2021-06-24 15:36         ` Sean Christopherson
2021-06-24 15:36         ` Sean Christopherson
2021-06-24  3:57 ` [PATCH 2/6] KVM: mmu: also return page from gfn_to_pfn David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  8:52   ` Nicholas Piggin
2021-06-24  8:52     ` Nicholas Piggin
2021-06-24  8:52     ` [Intel-gfx] " Nicholas Piggin
2021-06-24  8:52     ` Nicholas Piggin
2021-06-24  8:52     ` Nicholas Piggin
2021-06-24  8:52     ` Nicholas Piggin
2021-06-24  9:42     ` Paolo Bonzini
2021-06-24  9:42       ` Paolo Bonzini
2021-06-24  9:42       ` [Intel-gfx] " Paolo Bonzini
2021-06-24  9:42       ` Paolo Bonzini
2021-06-24  9:42       ` Paolo Bonzini
2021-06-24  9:42       ` Paolo Bonzini
2021-06-24  9:57       ` Nicholas Piggin
2021-06-24  9:57         ` Nicholas Piggin
2021-06-24  9:57         ` [Intel-gfx] " Nicholas Piggin
2021-06-24  9:57         ` Nicholas Piggin
2021-06-24  9:57         ` Nicholas Piggin
2021-06-24  9:57         ` Nicholas Piggin
2021-06-24 10:13         ` Paolo Bonzini
2021-06-24 10:13           ` Paolo Bonzini
2021-06-24 10:13           ` [Intel-gfx] " Paolo Bonzini
2021-06-24 10:13           ` Paolo Bonzini
2021-06-24 10:13           ` Paolo Bonzini
2021-06-24 10:13           ` Paolo Bonzini
2021-06-24 10:17         ` Nicholas Piggin
2021-06-24 10:17           ` Nicholas Piggin
2021-06-24 10:17           ` [Intel-gfx] " Nicholas Piggin
2021-06-24 10:17           ` Nicholas Piggin
2021-06-24 10:17           ` Nicholas Piggin
2021-06-24 10:17           ` Nicholas Piggin
2021-06-24 10:21           ` Paolo Bonzini
2021-06-24 10:21             ` Paolo Bonzini
2021-06-24 10:21             ` [Intel-gfx] " Paolo Bonzini
2021-06-24 10:21             ` Paolo Bonzini
2021-06-24 10:21             ` Paolo Bonzini
2021-06-24 10:21             ` Paolo Bonzini
2021-06-24 10:42             ` Nicholas Piggin
2021-06-24 10:42               ` Nicholas Piggin
2021-06-24 10:42               ` [Intel-gfx] " Nicholas Piggin
2021-06-24 10:42               ` Nicholas Piggin
2021-06-24 10:42               ` Nicholas Piggin
2021-06-24 10:42               ` Nicholas Piggin
2021-06-24  9:40   ` Marc Zyngier
2021-06-24  9:40     ` Marc Zyngier
2021-06-24  9:40     ` [Intel-gfx] " Marc Zyngier
2021-06-24  9:40     ` Marc Zyngier
2021-06-24  9:40     ` Marc Zyngier
2021-06-24  9:40     ` Marc Zyngier
2021-06-24  9:40     ` Marc Zyngier
2021-06-24  3:57 ` [PATCH 3/6] KVM: x86/mmu: avoid struct page in MMU David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  7:31   ` Paolo Bonzini
2021-06-24  7:31     ` Paolo Bonzini
2021-06-24  7:31     ` [Intel-gfx] " Paolo Bonzini
2021-06-24  7:31     ` Paolo Bonzini
2021-06-24  7:31     ` Paolo Bonzini
2021-06-24  7:31     ` Paolo Bonzini
2021-06-24  8:58   ` Nicholas Piggin
2021-06-24  8:58     ` Nicholas Piggin
2021-06-24  8:58     ` [Intel-gfx] " Nicholas Piggin
2021-06-24  8:58     ` Nicholas Piggin
2021-06-24  8:58     ` Nicholas Piggin
2021-06-24  8:58     ` Nicholas Piggin
2021-06-24 10:06     ` Marc Zyngier
2021-06-24 10:06       ` Marc Zyngier
2021-06-24 10:06       ` [Intel-gfx] " Marc Zyngier
2021-06-24 10:06       ` Marc Zyngier
2021-06-24 10:06       ` Marc Zyngier
2021-06-24 10:06       ` Marc Zyngier
2021-06-24 10:06       ` Marc Zyngier
2021-06-24 10:17       ` Paolo Bonzini
2021-06-24 10:17         ` Paolo Bonzini
2021-06-24 10:17         ` [Intel-gfx] " Paolo Bonzini
2021-06-24 10:17         ` Paolo Bonzini
2021-06-24 10:17         ` Paolo Bonzini
2021-06-24 10:17         ` Paolo Bonzini
2021-06-24 10:17         ` Paolo Bonzini
2021-06-24 10:43       ` Nicholas Piggin
2021-06-24 10:43         ` Nicholas Piggin
2021-06-24 10:43         ` [Intel-gfx] " Nicholas Piggin
2021-06-24 10:43         ` Nicholas Piggin
2021-06-24 10:43         ` Nicholas Piggin
2021-06-24 10:43         ` Nicholas Piggin
2021-06-24 10:43         ` Nicholas Piggin
2021-06-24  3:57 ` [PATCH 4/6] KVM: arm64/mmu: " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24 10:43   ` Marc Zyngier
2021-06-24 10:43     ` Marc Zyngier
2021-06-24 10:43     ` [Intel-gfx] " Marc Zyngier
2021-06-24 10:43     ` Marc Zyngier
2021-06-24 10:43     ` Marc Zyngier
2021-06-24 10:43     ` Marc Zyngier
2021-06-24 10:43     ` Marc Zyngier
2021-06-24  3:57 ` [PATCH 5/6] KVM: mmu: remove over-aggressive warnings David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57 ` [PATCH 6/6] drm/i915/gvt: use gfn_to_pfn's page instead of pfn David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` [Intel-gfx] " David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  3:57   ` David Stevens
2021-06-24  4:28   ` David Stevens
2021-06-24  4:28     ` David Stevens
2021-06-24  4:28     ` [Intel-gfx] " David Stevens
2021-06-24  4:28     ` David Stevens
2021-06-24  4:28     ` David Stevens
2021-06-24  4:28     ` David Stevens
2021-06-24 10:25   ` kernel test robot
2021-06-24 10:38   ` kernel test robot
2021-06-24  4:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for KVM: Remove uses of struct page from x86 and arm64 MMU Patchwork
2021-06-24  6:57 ` [PATCH 0/6] " Paolo Bonzini
2021-06-24  6:57   ` Paolo Bonzini
2021-06-24  6:57   ` [Intel-gfx] " Paolo Bonzini
2021-06-24  6:57   ` Paolo Bonzini
2021-06-24  6:57   ` Paolo Bonzini
2021-06-24  6:57   ` Paolo Bonzini
2021-06-24 10:34 ` Nicholas Piggin
2021-06-24 10:34   ` Nicholas Piggin
2021-06-24 10:34   ` [Intel-gfx] " Nicholas Piggin
2021-06-24 10:34   ` Nicholas Piggin
2021-06-24 10:34   ` Nicholas Piggin
2021-06-24 10:34   ` Nicholas Piggin
2021-06-24 11:42   ` Nicholas Piggin
2021-06-24 11:42     ` Nicholas Piggin
2021-06-24 11:42     ` [Intel-gfx] " Nicholas Piggin
2021-06-24 11:42     ` Nicholas Piggin
2021-06-24 11:42     ` Nicholas Piggin
2021-06-24 11:42     ` Nicholas Piggin
2021-06-24 12:00     ` Paolo Bonzini
2021-06-24 12:00       ` Paolo Bonzini
2021-06-24 12:00       ` [Intel-gfx] " Paolo Bonzini
2021-06-24 12:00       ` Paolo Bonzini
2021-06-24 12:00       ` Paolo Bonzini
2021-06-24 12:00       ` Paolo Bonzini
2021-06-24 12:41     ` Paolo Bonzini
2021-06-24 12:41       ` Paolo Bonzini
2021-06-24 12:41       ` [Intel-gfx] " Paolo Bonzini
2021-06-24 12:41       ` Paolo Bonzini
2021-06-24 12:41       ` Paolo Bonzini
2021-06-24 12:41       ` Paolo Bonzini
2021-06-24 12:57       ` Nicholas Piggin
2021-06-24 12:57         ` Nicholas Piggin
2021-06-24 12:57         ` [Intel-gfx] " Nicholas Piggin
2021-06-24 12:57         ` Nicholas Piggin
2021-06-24 12:57         ` Nicholas Piggin
2021-06-24 12:57         ` Nicholas Piggin
2021-06-24 15:35         ` Paolo Bonzini
2021-06-24 15:35           ` Paolo Bonzini
2021-06-24 15:35           ` [Intel-gfx] " Paolo Bonzini
2021-06-24 15:35           ` Paolo Bonzini
2021-06-24 15:35           ` Paolo Bonzini
2021-06-24 15:35           ` Paolo Bonzini
2021-06-25  0:20           ` Nicholas Piggin
2021-06-25  0:20             ` Nicholas Piggin
2021-06-25  0:20             ` [Intel-gfx] " Nicholas Piggin
2021-06-25  0:20             ` Nicholas Piggin
2021-06-25  0:20             ` Nicholas Piggin
2021-06-25  0:20             ` Nicholas Piggin
2021-06-25  7:44         ` Christian Borntraeger
2021-06-25  7:44           ` Christian Borntraeger
2021-06-25  7:44           ` [Intel-gfx] " Christian Borntraeger
2021-06-25  7:44           ` Christian Borntraeger
2021-06-25  7:44           ` Christian Borntraeger
2021-06-25  7:44           ` Christian Borntraeger
2021-06-24 16:07 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for KVM: Remove uses of struct page from x86 and arm64 MMU (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.