All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
@ 2023-05-13  0:35 ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
page-track APIs to provide a leaner and cleaner interface.  The motivation
for this series is to (significantly) reduce the number of KVM APIs that
KVMGT uses, with a long-term goal of making all kvm_host.h headers
KVM-internal.

As always for this series, the KVMGT changes are compile tested only.

Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".

v3:
 - Collect reviewed/tested tags (I apologize if I missed any, I manually
   gathered them this time due to a goof in my workflow). [Yan]
 - Drop check on max KVM paging size from KVMGT. [Yan]
 - Drop the explicit change on THP pages, and instead validate that the
   pfns (not struct page pointers) are contiguous. [Yan]
 - Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
   for shadowing 2MiB GTT entries. [Yan]
 - Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of exposing
   kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
 - Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
 - Wrap full definition of external page track structures with
   CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]

v2:
 - https://lore.kernel.org/all/20230311002258.852397-1-seanjc@google.com
 - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
   buggy) mutext. [Yan]
 - Remove a spurious return from kvm_page_track_init(). [Yan]
 - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
 - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
   of providing a dedicated interface. [Yan]

v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-seanjc@google.com

Sean Christopherson (24):
  drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  drm/i915/gvt: Verify hugepages are contiguous in physical address
    space
  drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
  drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
    entry type
  drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
    GTT
  drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  drm/i915/gvt: Protect gfn hash table with vgpu_lock
  KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
  KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
    change
  KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
    slot
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Assert that correct locks are held for page
    write-tracking
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  drm/i915/gvt: Drop final dependencies on KVM internal details

Yan Zhao (4):
  drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: switch from ->track_flush_slot() to
    ->track_remove_region()
  KVM: x86: Remove the unused page-track hook track_flush_slot()

 arch/x86/include/asm/kvm_host.h       |  16 +-
 arch/x86/include/asm/kvm_page_track.h |  73 +++-----
 arch/x86/kvm/mmu.h                    |   2 +
 arch/x86/kvm/mmu/mmu.c                |  51 +++--
 arch/x86/kvm/mmu/page_track.c         | 256 +++++++++++++-------------
 arch/x86/kvm/mmu/page_track.h         |  58 ++++++
 arch/x86/kvm/x86.c                    |  22 +--
 drivers/gpu/drm/i915/gvt/gtt.c        | 102 ++--------
 drivers/gpu/drm/i915/gvt/gtt.h        |   1 -
 drivers/gpu/drm/i915/gvt/gvt.h        |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 117 +++++-------
 drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
 12 files changed, 320 insertions(+), 391 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h


base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
@ 2023-05-13  0:35 ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
page-track APIs to provide a leaner and cleaner interface.  The motivation
for this series is to (significantly) reduce the number of KVM APIs that
KVMGT uses, with a long-term goal of making all kvm_host.h headers
KVM-internal.

As always for this series, the KVMGT changes are compile tested only.

Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".

v3:
 - Collect reviewed/tested tags (I apologize if I missed any, I manually
   gathered them this time due to a goof in my workflow). [Yan]
 - Drop check on max KVM paging size from KVMGT. [Yan]
 - Drop the explicit change on THP pages, and instead validate that the
   pfns (not struct page pointers) are contiguous. [Yan]
 - Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
   for shadowing 2MiB GTT entries. [Yan]
 - Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of exposing
   kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
 - Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
 - Wrap full definition of external page track structures with
   CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]

v2:
 - https://lore.kernel.org/all/20230311002258.852397-1-seanjc@google.com
 - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
   buggy) mutext. [Yan]
 - Remove a spurious return from kvm_page_track_init(). [Yan]
 - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
 - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
   of providing a dedicated interface. [Yan]

v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-seanjc@google.com

Sean Christopherson (24):
  drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  drm/i915/gvt: Verify hugepages are contiguous in physical address
    space
  drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
  drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
    entry type
  drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
    GTT
  drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  drm/i915/gvt: Protect gfn hash table with vgpu_lock
  KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
  KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
    change
  KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
    slot
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Assert that correct locks are held for page
    write-tracking
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  drm/i915/gvt: Drop final dependencies on KVM internal details

Yan Zhao (4):
  drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: switch from ->track_flush_slot() to
    ->track_remove_region()
  KVM: x86: Remove the unused page-track hook track_flush_slot()

 arch/x86/include/asm/kvm_host.h       |  16 +-
 arch/x86/include/asm/kvm_page_track.h |  73 +++-----
 arch/x86/kvm/mmu.h                    |   2 +
 arch/x86/kvm/mmu/mmu.c                |  51 +++--
 arch/x86/kvm/mmu/page_track.c         | 256 +++++++++++++-------------
 arch/x86/kvm/mmu/page_track.h         |  58 ++++++
 arch/x86/kvm/x86.c                    |  22 +--
 drivers/gpu/drm/i915/gvt/gtt.c        | 102 ++--------
 drivers/gpu/drm/i915/gvt/gtt.h        |   1 -
 drivers/gpu/drm/i915/gvt/gvt.h        |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 117 +++++-------
 drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
 12 files changed, 320 insertions(+), 391 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h


base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v3 01/28] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Check that the pfn found by gfn_to_pfn() is actually backed by "struct
page" memory prior to retrieving and dereferencing the page.  KVM
supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
there is no guarantee the pfn returned by gfn_to_pfn() has an associated
"struct page".

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 4ec85308379a..58b9b316ae46 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
 	return PageTransHuge(pfn_to_page(pfn));
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 01/28] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Check that the pfn found by gfn_to_pfn() is actually backed by "struct
page" memory prior to retrieving and dereferencing the page.  KVM
supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
there is no guarantee the pfn returned by gfn_to_pfn() has an associated
"struct page".

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 4ec85308379a..58b9b316ae46 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
 	return PageTransHuge(pfn_to_page(pfn));
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 02/28] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Currently intel_gvt_is_valid_gfn() is called in two places:
(1) shadowing guest GGTT entry
(2) shadowing guest PPGTT leaf entry,
which was introduced in commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

However, now it's not necessary to call this interface any more, because
a. GGTT partial write issue has been fixed by
   commit bc0686ff5fad
   ("drm/i915/gvt: support inconsecutive partial gtt entry write")
   commit 510fe10b6180
   ("drm/i915/gvt: fix a bug of partially write ggtt enties")
b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
   as valid page table writes. Any invalid GPA found is regarded as
   an error, either due to guest misbehavior/attack or bug in host
   shadow code.
   So,rather than do GFN pre-checking and replace invalid GFNs with
   scratch GFN and continue silently, just remove the pre-checking and
   abort PPGTT shadowing on error detected.
c. GFN validity check is still performed in
   intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
   It's more desirable to call VFIO interface to do both validity check
   and mapping.
   Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
   while later mapping the GFN through VFIO interface is unnecessarily
   fragile and confusing for unaware readers.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: remove now-unused local variables]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
 1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 58b9b316ae46..f30922c55a0c 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,22 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;
 
-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-	struct kvm *kvm = vgpu->vfio_device.kvm;
-	int idx;
-	bool ret;
-
-	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
-		return false;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = kvm_is_visible_gfn(kvm, gfn);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return ret;
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address
@@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 {
 	struct intel_vgpu *vgpu = spt->vgpu;
-	struct intel_gvt *gvt = vgpu->gvt;
-	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
 	struct intel_vgpu_ppgtt_spt *s;
 	struct intel_gvt_gtt_entry se, ge;
-	unsigned long gfn, i;
+	unsigned long i;
 	int ret;
 
 	trace_spt_change(spt->vgpu->id, "born", spt,
@@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 			ppgtt_generate_shadow_entry(&se, s, &ge);
 			ppgtt_set_shadow_entry(spt, &se, i);
 		} else {
-			gfn = ops->get_pfn(&ge);
-			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-				ppgtt_set_shadow_entry(spt, &se, i);
-				continue;
-			}
-
 			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
 			if (ret)
 				goto fail;
@@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		m.val64 = e.val64;
 		m.type = e.type;
 
-		/* one PTE update may be issued in multiple writes and the
-		 * first write may not construct a valid gfn
-		 */
-		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-			goto out;
-		}
-
 		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
 						   &dma_addr);
 		if (ret) {
@@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		ops->clear_present(&m);
 	}
 
-out:
 	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
 
 	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 02/28] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

From: Yan Zhao <yan.y.zhao@intel.com>

Currently intel_gvt_is_valid_gfn() is called in two places:
(1) shadowing guest GGTT entry
(2) shadowing guest PPGTT leaf entry,
which was introduced in commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

However, now it's not necessary to call this interface any more, because
a. GGTT partial write issue has been fixed by
   commit bc0686ff5fad
   ("drm/i915/gvt: support inconsecutive partial gtt entry write")
   commit 510fe10b6180
   ("drm/i915/gvt: fix a bug of partially write ggtt enties")
b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
   as valid page table writes. Any invalid GPA found is regarded as
   an error, either due to guest misbehavior/attack or bug in host
   shadow code.
   So,rather than do GFN pre-checking and replace invalid GFNs with
   scratch GFN and continue silently, just remove the pre-checking and
   abort PPGTT shadowing on error detected.
c. GFN validity check is still performed in
   intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
   It's more desirable to call VFIO interface to do both validity check
   and mapping.
   Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
   while later mapping the GFN through VFIO interface is unnecessarily
   fragile and confusing for unaware readers.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: remove now-unused local variables]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
 1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 58b9b316ae46..f30922c55a0c 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,22 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;
 
-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-	struct kvm *kvm = vgpu->vfio_device.kvm;
-	int idx;
-	bool ret;
-
-	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
-		return false;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = kvm_is_visible_gfn(kvm, gfn);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return ret;
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address
@@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 {
 	struct intel_vgpu *vgpu = spt->vgpu;
-	struct intel_gvt *gvt = vgpu->gvt;
-	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
 	struct intel_vgpu_ppgtt_spt *s;
 	struct intel_gvt_gtt_entry se, ge;
-	unsigned long gfn, i;
+	unsigned long i;
 	int ret;
 
 	trace_spt_change(spt->vgpu->id, "born", spt,
@@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 			ppgtt_generate_shadow_entry(&se, s, &ge);
 			ppgtt_set_shadow_entry(spt, &se, i);
 		} else {
-			gfn = ops->get_pfn(&ge);
-			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-				ppgtt_set_shadow_entry(spt, &se, i);
-				continue;
-			}
-
 			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
 			if (ret)
 				goto fail;
@@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		m.val64 = e.val64;
 		m.type = e.type;
 
-		/* one PTE update may be issued in multiple writes and the
-		 * first write may not construct a valid gfn
-		 */
-		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-			goto out;
-		}
-
 		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
 						   &dma_addr);
 		if (ret) {
@@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		ops->clear_present(&m);
 	}
 
-out:
 	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
 
 	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

When shadowing a GTT entry with a 2M page, verify that the pfns are
contiguous, not just that the struct page pointers are contiguous.  The
memory map is virtual contiguous if "CONFIG_FLATMEM=y ||
CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y &&
CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct
pages that are virtually contiguous, but not physically contiguous.

In practice, this flaw is likely a non-issue as it would cause functional
problems iff a section isn't 2M aligned _and_ is directly adjacent to
another section with discontiguous pfns.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index de675d799c7d..429f0f993a13 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -161,7 +161,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 
 		if (npage == 0)
 			base_page = cur_page;
-		else if (base_page + npage != cur_page) {
+		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When shadowing a GTT entry with a 2M page, verify that the pfns are
contiguous, not just that the struct page pointers are contiguous.  The
memory map is virtual contiguous if "CONFIG_FLATMEM=y ||
CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y &&
CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct
pages that are virtually contiguous, but not physically contiguous.

In practice, this flaw is likely a non-issue as it would cause functional
problems iff a section isn't 2M aligned _and_ is directly adjacent to
another section with discontiguous pfns.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index de675d799c7d..429f0f993a13 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -161,7 +161,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 
 		if (npage == 0)
 			base_page = cur_page;
-		else if (base_page + npage != cur_page) {
+		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 04/28] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that
the caller is ultimately responsible for dropping any reference.

Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted
struct page before trying to put any references.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f30922c55a0c..5426a27c1b71 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1158,6 +1158,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
 	kvm_pfn_t pfn;
+	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1171,7 +1172,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	ret = PageTransHuge(pfn_to_page(pfn));
+	kvm_release_pfn_clean(pfn);
+	return ret;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 04/28] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that
the caller is ultimately responsible for dropping any reference.

Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted
struct page before trying to put any references.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f30922c55a0c..5426a27c1b71 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1158,6 +1158,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
 	kvm_pfn_t pfn;
+	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1171,7 +1172,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	ret = PageTransHuge(pfn_to_page(pfn));
+	kvm_release_pfn_clean(pfn);
+	return ret;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
sole caller, ppgtt_populate_shadow_entry().  All of the paths in
ppgtt_populate_shadow_entry() eventually check for attachment by way of
intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
unnecessary work and will make it more obvious that a future cleanup of
is_2MB_gtt_possible() isn't introducing a bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 5426a27c1b71..2aed31b497c9 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1163,8 +1163,6 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
-		return -EINVAL;
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -1277,6 +1275,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	if (!pte_ops->test_present(ge))
 		return 0;
 
+	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
+		return -EINVAL;
+
 	gfn = pte_ops->get_pfn(ge);
 
 	switch (ge->type) {
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
sole caller, ppgtt_populate_shadow_entry().  All of the paths in
ppgtt_populate_shadow_entry() eventually check for attachment by way of
intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
unnecessary work and will make it more obvious that a future cleanup of
is_2MB_gtt_possible() isn't introducing a bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 5426a27c1b71..2aed31b497c9 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1163,8 +1163,6 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
-	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
-		return -EINVAL;
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
@@ -1277,6 +1275,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	if (!pte_ops->test_present(ge))
 		return 0;
 
+	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
+		return -EINVAL;
+
 	gfn = pte_ops->get_pfn(ge);
 
 	switch (ge->type) {
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 06/28] drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bail from ppgtt_populate_shadow_entry() if an unexpected GTT entry type
is encountered instead of subtly falling through to the common "direct
shadow" path.  Eliminating the default/error path's reliance on the common
handling will allow hoisting intel_gvt_dma_map_guest_page() into the case
statements so that the 2MiB case can try intel_gvt_dma_map_guest_page()
and fallback to splitting the entry on failure.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 2aed31b497c9..61e38acee2d5 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1306,6 +1306,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return -EINVAL;
 	default:
 		GEM_BUG_ON(1);
+		return -EINVAL;
 	}
 
 	/* direct shadow */
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 06/28] drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Bail from ppgtt_populate_shadow_entry() if an unexpected GTT entry type
is encountered instead of subtly falling through to the common "direct
shadow" path.  Eliminating the default/error path's reliance on the common
handling will allow hoisting intel_gvt_dma_map_guest_page() into the case
statements so that the 2MiB case can try intel_gvt_dma_map_guest_page()
and fallback to splitting the entry on failure.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 2aed31b497c9..61e38acee2d5 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1306,6 +1306,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return -EINVAL;
 	default:
 		GEM_BUG_ON(1);
+		return -EINVAL;
 	}
 
 	/* direct shadow */
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 07/28] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
2MiB GTT entry is possible and instead just try to map the GFN with a 2MiB
entry.  Using KVM to query pfn that is ultimately managed through VFIO is
odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
exported only because of KVM vendor modules (x86 and PPC).

Open code the check on 2MiB support instead of keeping
is_2MB_gtt_possible() around for a single line of code.

Move the call to intel_gvt_dma_map_guest_page() for a 4KiB entry into its
case statement, i.e. fork the common path into the 4KiB and 2MiB "direct"
shadow paths.  Keeping the call in the "common" path is arguably more in
the spirit of "one change per patch", but retaining the local "page_size"
variable is silly, i.e. the call site will be changed either way, and
jumping around the no-longer-common code is more subtle and rather odd,
i.e. would just need to be immediately cleaned up.

Drop the error message from gvt_pin_guest_page() when KVMGT attempts to
shadow a 2MiB guest page that isn't backed by a compatible hugepage in the
host.  Dropping the pre-check on a THP makes it much more likely that the
"error" will be encountered in normal operation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c   | 49 ++++++--------------------------
 drivers/gpu/drm/i915/gvt/kvmgt.c |  1 -
 2 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 61e38acee2d5..f505be9e647a 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1145,36 +1145,6 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
 	ops->set_pfn(se, s->shadow_page.mfn);
 }
 
-/*
- * Check if can do 2M page
- * @vgpu: target vgpu
- * @entry: target pfn's gtt entry
- *
- * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
- * negative if found err.
- */
-static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
-	struct intel_gvt_gtt_entry *entry)
-{
-	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	kvm_pfn_t pfn;
-	int ret;
-
-	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
-		return 0;
-
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
-	if (is_error_noslot_pfn(pfn))
-		return -EINVAL;
-
-	if (!pfn_valid(pfn))
-		return -EINVAL;
-
-	ret = PageTransHuge(pfn_to_page(pfn));
-	kvm_release_pfn_clean(pfn);
-	return ret;
-}
-
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
 	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
 	struct intel_gvt_gtt_entry *se)
@@ -1268,7 +1238,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
 	struct intel_gvt_gtt_entry se = *ge;
-	unsigned long gfn, page_size = PAGE_SIZE;
+	unsigned long gfn;
 	dma_addr_t dma_addr;
 	int ret;
 
@@ -1283,6 +1253,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	switch (ge->type) {
 	case GTT_TYPE_PPGTT_PTE_4K_ENTRY:
 		gvt_vdbg_mm("shadow 4K gtt entry\n");
+		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
+		if (ret)
+			return -ENXIO;
 		break;
 	case GTT_TYPE_PPGTT_PTE_64K_ENTRY:
 		gvt_vdbg_mm("shadow 64K gtt entry\n");
@@ -1294,12 +1267,10 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return split_64KB_gtt_entry(vgpu, spt, index, &se);
 	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
 		gvt_vdbg_mm("shadow 2M gtt entry\n");
-		ret = is_2MB_gtt_possible(vgpu, ge);
-		if (ret == 0)
+		if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M) ||
+		    intel_gvt_dma_map_guest_page(vgpu, gfn,
+						 I915_GTT_PAGE_SIZE_2M, &dma_addr))
 			return split_2MB_gtt_entry(vgpu, spt, index, &se);
-		else if (ret < 0)
-			return ret;
-		page_size = I915_GTT_PAGE_SIZE_2M;
 		break;
 	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
 		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
@@ -1309,11 +1280,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return -EINVAL;
 	}
 
-	/* direct shadow */
-	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
-	if (ret)
-		return -ENXIO;
-
+	/* Successfully shadowed a 4K or 2M page (without splitting). */
 	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
 	ppgtt_set_shadow_entry(spt, &se, index);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 429f0f993a13..92ceefe1e6fb 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -162,7 +162,6 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		if (npage == 0)
 			base_page = cur_page;
 		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
-			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
 			goto err;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 07/28] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
2MiB GTT entry is possible and instead just try to map the GFN with a 2MiB
entry.  Using KVM to query pfn that is ultimately managed through VFIO is
odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
exported only because of KVM vendor modules (x86 and PPC).

Open code the check on 2MiB support instead of keeping
is_2MB_gtt_possible() around for a single line of code.

Move the call to intel_gvt_dma_map_guest_page() for a 4KiB entry into its
case statement, i.e. fork the common path into the 4KiB and 2MiB "direct"
shadow paths.  Keeping the call in the "common" path is arguably more in
the spirit of "one change per patch", but retaining the local "page_size"
variable is silly, i.e. the call site will be changed either way, and
jumping around the no-longer-common code is more subtle and rather odd,
i.e. would just need to be immediately cleaned up.

Drop the error message from gvt_pin_guest_page() when KVMGT attempts to
shadow a 2MiB guest page that isn't backed by a compatible hugepage in the
host.  Dropping the pre-check on a THP makes it much more likely that the
"error" will be encountered in normal operation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c   | 49 ++++++--------------------------
 drivers/gpu/drm/i915/gvt/kvmgt.c |  1 -
 2 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 61e38acee2d5..f505be9e647a 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1145,36 +1145,6 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
 	ops->set_pfn(se, s->shadow_page.mfn);
 }
 
-/*
- * Check if can do 2M page
- * @vgpu: target vgpu
- * @entry: target pfn's gtt entry
- *
- * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
- * negative if found err.
- */
-static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
-	struct intel_gvt_gtt_entry *entry)
-{
-	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
-	kvm_pfn_t pfn;
-	int ret;
-
-	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
-		return 0;
-
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
-	if (is_error_noslot_pfn(pfn))
-		return -EINVAL;
-
-	if (!pfn_valid(pfn))
-		return -EINVAL;
-
-	ret = PageTransHuge(pfn_to_page(pfn));
-	kvm_release_pfn_clean(pfn);
-	return ret;
-}
-
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
 	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
 	struct intel_gvt_gtt_entry *se)
@@ -1268,7 +1238,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
 	struct intel_gvt_gtt_entry se = *ge;
-	unsigned long gfn, page_size = PAGE_SIZE;
+	unsigned long gfn;
 	dma_addr_t dma_addr;
 	int ret;
 
@@ -1283,6 +1253,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	switch (ge->type) {
 	case GTT_TYPE_PPGTT_PTE_4K_ENTRY:
 		gvt_vdbg_mm("shadow 4K gtt entry\n");
+		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
+		if (ret)
+			return -ENXIO;
 		break;
 	case GTT_TYPE_PPGTT_PTE_64K_ENTRY:
 		gvt_vdbg_mm("shadow 64K gtt entry\n");
@@ -1294,12 +1267,10 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return split_64KB_gtt_entry(vgpu, spt, index, &se);
 	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
 		gvt_vdbg_mm("shadow 2M gtt entry\n");
-		ret = is_2MB_gtt_possible(vgpu, ge);
-		if (ret == 0)
+		if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M) ||
+		    intel_gvt_dma_map_guest_page(vgpu, gfn,
+						 I915_GTT_PAGE_SIZE_2M, &dma_addr))
 			return split_2MB_gtt_entry(vgpu, spt, index, &se);
-		else if (ret < 0)
-			return ret;
-		page_size = I915_GTT_PAGE_SIZE_2M;
 		break;
 	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
 		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
@@ -1309,11 +1280,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return -EINVAL;
 	}
 
-	/* direct shadow */
-	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
-	if (ret)
-		return -ENXIO;
-
+	/* Successfully shadowed a 4K or 2M page (without splitting). */
 	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
 	ppgtt_set_shadow_entry(spt, &se, index);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 429f0f993a13..92ceefe1e6fb 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -162,7 +162,6 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		if (npage == 0)
 			base_page = cur_page;
 		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
-			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
 			npage++;
 			goto err;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 08/28] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Use an "unsigned long" instead of an "int" when iterating over the gfns
in a memslot.  The number of pages in the memslot is tracked as an
"unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
larger than 16TiB were deleted (2^32 * 4KiB).

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 92ceefe1e6fb..117bac85ac2c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1619,7 +1619,7 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node)
 {
-	int i;
+	unsigned long i;
 	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 08/28] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Use an "unsigned long" instead of an "int" when iterating over the gfns
in a memslot.  The number of pages in the memslot is tracked as an
"unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
larger than 16TiB were deleted (2^32 * 4KiB).

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 92ceefe1e6fb..117bac85ac2c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1619,7 +1619,7 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node)
 {
-	int i;
+	unsigned long i;
 	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 09/28] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop intel_vgpu_reset_gtt() as it no longer has any callers.  In addition
to eliminating dead code, this eliminates the last possible scenario where
__kvmgt_protect_table_find() can be reached without holding vgpu_lock.
Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find()
will allow a protecting the gfn hash with vgpu_lock without too much fuss.

No functional change intended.

Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------
 drivers/gpu/drm/i915/gvt/gtt.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f505be9e647a..c3c623b929ce 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2817,24 +2817,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old)
 	ggtt_invalidate(gvt->gt);
 }
 
-/**
- * intel_vgpu_reset_gtt - reset the all GTT related status
- * @vgpu: a vGPU
- *
- * This function is called from vfio core to reset reset all
- * GTT related status, including GGTT, PPGTT, scratch page.
- *
- */
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu)
-{
-	/* Shadow pages are only created when there is no page
-	 * table tracking data, so remove page tracking data after
-	 * removing the shadow pages.
-	 */
-	intel_vgpu_destroy_all_ppgtt_mm(vgpu);
-	intel_vgpu_reset_ggtt(vgpu, true);
-}
-
 /**
  * intel_gvt_restore_ggtt - restore all vGPU's ggtt entries
  * @gvt: intel gvt device
diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
index a3b0f59ec8bd..4cb183e06e95 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.h
+++ b/drivers/gpu/drm/i915/gvt/gtt.h
@@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old);
 void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);
 
 int intel_gvt_init_gtt(struct intel_gvt *gvt);
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu);
 void intel_gvt_clean_gtt(struct intel_gvt *gvt);
 
 struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 09/28] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Drop intel_vgpu_reset_gtt() as it no longer has any callers.  In addition
to eliminating dead code, this eliminates the last possible scenario where
__kvmgt_protect_table_find() can be reached without holding vgpu_lock.
Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find()
will allow a protecting the gfn hash with vgpu_lock without too much fuss.

No functional change intended.

Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------
 drivers/gpu/drm/i915/gvt/gtt.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f505be9e647a..c3c623b929ce 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2817,24 +2817,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old)
 	ggtt_invalidate(gvt->gt);
 }
 
-/**
- * intel_vgpu_reset_gtt - reset the all GTT related status
- * @vgpu: a vGPU
- *
- * This function is called from vfio core to reset reset all
- * GTT related status, including GGTT, PPGTT, scratch page.
- *
- */
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu)
-{
-	/* Shadow pages are only created when there is no page
-	 * table tracking data, so remove page tracking data after
-	 * removing the shadow pages.
-	 */
-	intel_vgpu_destroy_all_ppgtt_mm(vgpu);
-	intel_vgpu_reset_ggtt(vgpu, true);
-}
-
 /**
  * intel_gvt_restore_ggtt - restore all vGPU's ggtt entries
  * @gvt: intel gvt device
diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
index a3b0f59ec8bd..4cb183e06e95 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.h
+++ b/drivers/gpu/drm/i915/gvt/gtt.h
@@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old);
 void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);
 
 int intel_gvt_init_gtt(struct intel_gvt *gvt);
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu);
 void intel_gvt_clean_gtt(struct intel_gvt *gvt);
 
 struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 10/28] drm/i915/gvt: Protect gfn hash table with vgpu_lock
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Use vgpu_lock instead of KVM's mmu_lock to protect accesses to the hash
table used to track which gfns are write-protected when shadowing the
guest's GTT, and hoist the acquisition of vgpu_lock from
intel_vgpu_page_track_handler() out to its sole caller,
kvmgt_page_track_write().

This fixes a bug where kvmgt_page_track_write(), which doesn't hold
kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
a use-after-free.

Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
sleep when acquiring vgpu->cache_lock deep down the callstack:

  intel_vgpu_page_track_handler()
  |
  |->  page_track->handler / ppgtt_write_protection_handler()
       |
       |-> ppgtt_handle_guest_write_page_table_bytes()
           |
           |->  ppgtt_handle_guest_write_page_table()
                |
                |-> ppgtt_handle_guest_entry_removal()
                    |
                    |-> ppgtt_invalidate_pte()
                        |
                        |-> intel_gvt_dma_unmap_guest_page()
                            |
                            |-> mutex_lock(&vgpu->cache_lock);

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 55 +++++++++++++++------------
 drivers/gpu/drm/i915/gvt/page_track.c | 10 +----
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 117bac85ac2c..0785f9cb2c20 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -351,6 +351,8 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
 {
 	struct kvmgt_pgfn *p, *res = NULL;
 
+	lockdep_assert_held(&info->vgpu_lock);
+
 	hash_for_each_possible(info->ptable, p, hnode, gfn) {
 		if (gfn == p->gfn) {
 			res = p;
@@ -1552,6 +1554,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
+	if (kvmgt_gfn_is_write_protected(info, gfn))
+		return 0;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	slot = gfn_to_memslot(kvm, gfn);
 	if (!slot) {
@@ -1560,16 +1565,12 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-
-	if (kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
-
 	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_add(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1582,24 +1583,22 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
+		return 0;
 
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
 	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_del(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1610,9 +1609,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
+	mutex_lock(&info->vgpu_lock);
+
 	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
+
+	mutex_unlock(&info->vgpu_lock);
 }
 
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
@@ -1624,16 +1627,20 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
-	write_lock(&kvm->mmu_lock);
+	mutex_lock(&info->vgpu_lock);
+
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
 		if (kvmgt_gfn_is_write_protected(info, gfn)) {
+			write_lock(&kvm->mmu_lock);
 			kvm_slot_page_track_remove_page(kvm, slot, gfn,
 						KVM_PAGE_TRACK_WRITE);
+			write_unlock(&kvm->mmu_lock);
+
 			kvmgt_protect_table_del(info, gfn);
 		}
 	}
-	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&info->vgpu_lock);
 }
 
 void intel_vgpu_detach_regions(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/gvt/page_track.c b/drivers/gpu/drm/i915/gvt/page_track.c
index df34e73cba41..60a65435556d 100644
--- a/drivers/gpu/drm/i915/gvt/page_track.c
+++ b/drivers/gpu/drm/i915/gvt/page_track.c
@@ -162,13 +162,9 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 	struct intel_vgpu_page_track *page_track;
 	int ret = 0;
 
-	mutex_lock(&vgpu->vgpu_lock);
-
 	page_track = intel_vgpu_find_page_track(vgpu, gpa >> PAGE_SHIFT);
-	if (!page_track) {
-		ret = -ENXIO;
-		goto out;
-	}
+	if (!page_track)
+		return -ENXIO;
 
 	if (unlikely(vgpu->failsafe)) {
 		/* Remove write protection to prevent furture traps. */
@@ -179,7 +175,5 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 			gvt_err("guest page write error, gpa %llx\n", gpa);
 	}
 
-out:
-	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 10/28] drm/i915/gvt: Protect gfn hash table with vgpu_lock
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Use vgpu_lock instead of KVM's mmu_lock to protect accesses to the hash
table used to track which gfns are write-protected when shadowing the
guest's GTT, and hoist the acquisition of vgpu_lock from
intel_vgpu_page_track_handler() out to its sole caller,
kvmgt_page_track_write().

This fixes a bug where kvmgt_page_track_write(), which doesn't hold
kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
a use-after-free.

Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
sleep when acquiring vgpu->cache_lock deep down the callstack:

  intel_vgpu_page_track_handler()
  |
  |->  page_track->handler / ppgtt_write_protection_handler()
       |
       |-> ppgtt_handle_guest_write_page_table_bytes()
           |
           |->  ppgtt_handle_guest_write_page_table()
                |
                |-> ppgtt_handle_guest_entry_removal()
                    |
                    |-> ppgtt_invalidate_pte()
                        |
                        |-> intel_gvt_dma_unmap_guest_page()
                            |
                            |-> mutex_lock(&vgpu->cache_lock);

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 55 +++++++++++++++------------
 drivers/gpu/drm/i915/gvt/page_track.c | 10 +----
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 117bac85ac2c..0785f9cb2c20 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -351,6 +351,8 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
 {
 	struct kvmgt_pgfn *p, *res = NULL;
 
+	lockdep_assert_held(&info->vgpu_lock);
+
 	hash_for_each_possible(info->ptable, p, hnode, gfn) {
 		if (gfn == p->gfn) {
 			res = p;
@@ -1552,6 +1554,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
+	if (kvmgt_gfn_is_write_protected(info, gfn))
+		return 0;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	slot = gfn_to_memslot(kvm, gfn);
 	if (!slot) {
@@ -1560,16 +1565,12 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-
-	if (kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
-
 	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_add(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1582,24 +1583,22 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
+		return 0;
 
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
 	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_del(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1610,9 +1609,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
+	mutex_lock(&info->vgpu_lock);
+
 	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
+
+	mutex_unlock(&info->vgpu_lock);
 }
 
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
@@ -1624,16 +1627,20 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
-	write_lock(&kvm->mmu_lock);
+	mutex_lock(&info->vgpu_lock);
+
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
 		if (kvmgt_gfn_is_write_protected(info, gfn)) {
+			write_lock(&kvm->mmu_lock);
 			kvm_slot_page_track_remove_page(kvm, slot, gfn,
 						KVM_PAGE_TRACK_WRITE);
+			write_unlock(&kvm->mmu_lock);
+
 			kvmgt_protect_table_del(info, gfn);
 		}
 	}
-	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&info->vgpu_lock);
 }
 
 void intel_vgpu_detach_regions(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/gvt/page_track.c b/drivers/gpu/drm/i915/gvt/page_track.c
index df34e73cba41..60a65435556d 100644
--- a/drivers/gpu/drm/i915/gvt/page_track.c
+++ b/drivers/gpu/drm/i915/gvt/page_track.c
@@ -162,13 +162,9 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 	struct intel_vgpu_page_track *page_track;
 	int ret = 0;
 
-	mutex_lock(&vgpu->vgpu_lock);
-
 	page_track = intel_vgpu_find_page_track(vgpu, gpa >> PAGE_SHIFT);
-	if (!page_track) {
-		ret = -ENXIO;
-		goto out;
-	}
+	if (!page_track)
+		return -ENXIO;
 
 	if (unlikely(vgpu->failsafe)) {
 		/* Remove write protection to prevent furture traps. */
@@ -179,7 +175,5 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 			gvt_err("guest page write error, gpa %llx\n", gpa);
 	}
 
-out:
-	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 11/28] KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Move x86's implementation of kvm_arch_flush_shadow_{all,memslot}() into
mmu.c, and make kvm_mmu_zap_all() static as it was globally visible only
for kvm_arch_flush_shadow_all().  This will allow refactoring
kvm_arch_flush_shadow_memslot() to call kvm_mmu_zap_all() directly without
having to expose kvm_mmu_zap_all_fast() outside of mmu.c.  Keeping
everything in mmu.c will also likely simplify supporting TDX, which
intends to do zap only relevant SPTEs on memslot updates.

No functional change intended.

Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu/mmu.c          | 13 ++++++++++++-
 arch/x86/kvm/x86.c              | 11 -----------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb9d1f2d6136..564a29153cee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1832,7 +1832,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
-void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..2e4476d38377 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6717,7 +6717,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 	 */
 }
 
-void kvm_mmu_zap_all(struct kvm *kvm)
+static void kvm_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
@@ -6742,6 +6742,17 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	write_unlock(&kvm->mmu_lock);
 }
 
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	kvm_mmu_zap_all(kvm);
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+	kvm_page_track_flush_slot(kvm, slot);
+}
+
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 {
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 523c39a03c00..b2d9c5979df7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12758,17 +12758,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		kvm_arch_free_memslot(kvm, old);
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_mmu_zap_all(kvm);
-}
-
-void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-				   struct kvm_memory_slot *slot)
-{
-	kvm_page_track_flush_slot(kvm, slot);
-}
-
 static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	return (is_guest_mode(vcpu) &&
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 11/28] KVM: x86/mmu: Move kvm_arch_flush_shadow_{all, memslot}() to mmu.c
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Move x86's implementation of kvm_arch_flush_shadow_{all,memslot}() into
mmu.c, and make kvm_mmu_zap_all() static as it was globally visible only
for kvm_arch_flush_shadow_all().  This will allow refactoring
kvm_arch_flush_shadow_memslot() to call kvm_mmu_zap_all() directly without
having to expose kvm_mmu_zap_all_fast() outside of mmu.c.  Keeping
everything in mmu.c will also likely simplify supporting TDX, which
intends to do zap only relevant SPTEs on memslot updates.

No functional change intended.

Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu/mmu.c          | 13 ++++++++++++-
 arch/x86/kvm/x86.c              | 11 -----------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb9d1f2d6136..564a29153cee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1832,7 +1832,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
-void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..2e4476d38377 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6717,7 +6717,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 	 */
 }
 
-void kvm_mmu_zap_all(struct kvm *kvm)
+static void kvm_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	LIST_HEAD(invalid_list);
@@ -6742,6 +6742,17 @@ void kvm_mmu_zap_all(struct kvm *kvm)
 	write_unlock(&kvm->mmu_lock);
 }
 
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	kvm_mmu_zap_all(kvm);
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+				   struct kvm_memory_slot *slot)
+{
+	kvm_page_track_flush_slot(kvm, slot);
+}
+
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
 {
 	WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 523c39a03c00..b2d9c5979df7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12758,17 +12758,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		kvm_arch_free_memslot(kvm, old);
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_mmu_zap_all(kvm);
-}
-
-void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-				   struct kvm_memory_slot *slot)
-{
-	kvm_page_track_flush_slot(kvm, slot);
-}
-
 static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	return (is_guest_mode(vcpu) &&
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
bouncing through the page-track mechanism.  KVM (unfortunately) needs to
zap and flush all page tables on memslot DELETE/MOVE irrespective of
whether KVM is shadowing guest page tables.

This will allow changing KVM to register a page-track notifier on the
first shadow root allocation, and will also allow deleting the misguided
kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
different method for reacting to memslot changes.

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e4476d38377..23a79723031b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
-			struct kvm_memory_slot *slot,
-			struct kvm_page_track_notifier_node *node)
-{
-	kvm_mmu_zap_all_fast(kvm);
-}
-
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
@@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	}
 
 	node->track_write = kvm_mmu_pte_write;
-	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
@@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
+	kvm_mmu_zap_all_fast(kvm);
+
 	kvm_page_track_flush_slot(kvm, slot);
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
bouncing through the page-track mechanism.  KVM (unfortunately) needs to
zap and flush all page tables on memslot DELETE/MOVE irrespective of
whether KVM is shadowing guest page tables.

This will allow changing KVM to register a page-track notifier on the
first shadow root allocation, and will also allow deleting the misguided
kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
different method for reacting to memslot changes.

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e4476d38377..23a79723031b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
-			struct kvm_memory_slot *slot,
-			struct kvm_page_track_notifier_node *node)
-{
-	kvm_mmu_zap_all_fast(kvm);
-}
-
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
@@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	}
 
 	node->track_write = kvm_mmu_pte_write;
-	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
@@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
+	kvm_mmu_zap_all_fast(kvm);
+
 	kvm_page_track_flush_slot(kvm, slot);
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 13/28] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Don't use the generic page-track mechanism to handle writes to guest PTEs
in KVM's MMU.  KVM's MMU needs access to information that should not be
exposed to external page-track users, e.g. KVM needs (for some definitions
of "need") the vCPU to query the current paging mode, whereas external
users, i.e. KVMGT, have no ties to the current vCPU and so should never
need the vCPU.

Moving away from the page-track mechanism will allow dropping use of the
page-track mechanism for KVM's own MMU, and will also allow simplifying
and cleaning up the page-track APIs.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 13 ++-----------
 arch/x86/kvm/mmu/page_track.c   |  2 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 564a29153cee..113598d3e886 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1247,7 +1247,6 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
-	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..253fb2093d5d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -121,6 +121,8 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes);
 
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 23a79723031b..af3e562d3106 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5677,9 +5677,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-			      const u8 *new, int bytes,
-			      struct kvm_page_track_notifier_node *node)
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -6186,7 +6185,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 	int r;
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
@@ -6200,9 +6198,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_mmu_pte_write;
-	kvm_page_track_register_notifier(kvm, node);
-
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
 
@@ -6223,10 +6218,6 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
-
-	kvm_page_track_unregister_notifier(kvm, node);
-
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 0a2ac438d647..23088c90d2fd 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -274,6 +274,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(vcpu, gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 13/28] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Don't use the generic page-track mechanism to handle writes to guest PTEs
in KVM's MMU.  KVM's MMU needs access to information that should not be
exposed to external page-track users, e.g. KVM needs (for some definitions
of "need") the vCPU to query the current paging mode, whereas external
users, i.e. KVMGT, have no ties to the current vCPU and so should never
need the vCPU.

Moving away from the page-track mechanism will allow dropping use of the
page-track mechanism for KVM's own MMU, and will also allow simplifying
and cleaning up the page-track APIs.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 13 ++-----------
 arch/x86/kvm/mmu/page_track.c   |  2 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 564a29153cee..113598d3e886 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1247,7 +1247,6 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
-	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..253fb2093d5d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -121,6 +121,8 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes);
 
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 23a79723031b..af3e562d3106 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5677,9 +5677,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-			      const u8 *new, int bytes,
-			      struct kvm_page_track_notifier_node *node)
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -6186,7 +6185,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 	int r;
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
@@ -6200,9 +6198,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_mmu_pte_write;
-	kvm_page_track_register_notifier(kvm, node);
-
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
 
@@ -6223,10 +6218,6 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
-
-	kvm_page_track_unregister_notifier(kvm, node);
-
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 0a2ac438d647..23088c90d2fd 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -274,6 +274,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(vcpu, gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 14/28] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  5 ++---
 arch/x86/kvm/mmu/page_track.c         |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 10 ++++------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..8c4d216e3b2b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
 	 * It is called when guest is writing the write-tracked page
 	 * and write emulation is finished at that time.
 	 *
-	 * @vcpu: the vcpu where the write access happened.
 	 * @gpa: the physical address written by guest.
 	 * @new: the data was written to the address.
 	 * @bytes: the written length.
 	 * @node: this node
 	 */
-	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			    int bytes, struct kvm_page_track_notifier_node *node);
+	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
+			    struct kvm_page_track_notifier_node *node);
 	/*
 	 * It is called when memory slot is being moved or removed
 	 * users can drop write-protection for the pages in that memory slot
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 23088c90d2fd..891e5cc52b45 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -272,7 +272,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
 				srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
-			n->track_write(vcpu, gpa, new, bytes, n);
+			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 0785f9cb2c20..aaebb44c139f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -106,9 +106,8 @@ struct gvt_dma {
 #define vfio_dev_to_vgpu(vfio_dev) \
 	container_of((vfio_dev), struct intel_vgpu, vfio_device)
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node);
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node);
@@ -1602,9 +1601,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	return 0;
 }
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node)
 {
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 14/28] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  5 ++---
 arch/x86/kvm/mmu/page_track.c         |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 10 ++++------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..8c4d216e3b2b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
 	 * It is called when guest is writing the write-tracked page
 	 * and write emulation is finished at that time.
 	 *
-	 * @vcpu: the vcpu where the write access happened.
 	 * @gpa: the physical address written by guest.
 	 * @new: the data was written to the address.
 	 * @bytes: the written length.
 	 * @node: this node
 	 */
-	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			    int bytes, struct kvm_page_track_notifier_node *node);
+	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
+			    struct kvm_page_track_notifier_node *node);
 	/*
 	 * It is called when memory slot is being moved or removed
 	 * users can drop write-protection for the pages in that memory slot
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 23088c90d2fd..891e5cc52b45 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -272,7 +272,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
 				srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
-			n->track_write(vcpu, gpa, new, bytes, n);
+			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 0785f9cb2c20..aaebb44c139f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -106,9 +106,8 @@ struct gvt_dma {
 #define vfio_dev_to_vgpu(vfio_dev) \
 	container_of((vfio_dev), struct intel_vgpu, vfio_device)
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node);
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node);
@@ -1602,9 +1601,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	return 0;
 }
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node)
 {
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 15/28] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Disallow moving memslots if the VM has external page-track users, i.e. if
KVMGT is being used to expose a virtual GPU to the guest, as KVMGT doesn't
correctly handle moving memory regions.

Note, this is potential ABI breakage!  E.g. userspace could move regions
that aren't shadowed by KVMGT without harming the guest.  However, the
only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
regions.  KVM's own support for moving memory regions was also broken for
multiple years (albeit for an edge case, but arguably moving RAM is
itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
new rmap and large page tracking when moving memslot").

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 3 +++
 arch/x86/kvm/mmu/page_track.c         | 5 +++++
 arch/x86/kvm/x86.c                    | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 8c4d216e3b2b..f744682648e7 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -75,4 +75,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 891e5cc52b45..e6de9638e560 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -303,3 +303,8 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d9c5979df7..c6bbd8ffd8c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12588,6 +12588,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * KVM doesn't support moving memslots when there are external page
+	 * trackers attached to the VM, i.e. if KVMGT is in use.
+	 */
+	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
+		return -EINVAL;
+
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
 		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
 			return -EINVAL;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 15/28] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disallow moving memslots if the VM has external page-track users, i.e. if
KVMGT is being used to expose a virtual GPU to the guest, as KVMGT doesn't
correctly handle moving memory regions.

Note, this is potential ABI breakage!  E.g. userspace could move regions
that aren't shadowed by KVMGT without harming the guest.  However, the
only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
regions.  KVM's own support for moving memory regions was also broken for
multiple years (albeit for an edge case, but arguably moving RAM is
itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
new rmap and large page tracking when moving memslot").

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 3 +++
 arch/x86/kvm/mmu/page_track.c         | 5 +++++
 arch/x86/kvm/x86.c                    | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 8c4d216e3b2b..f744682648e7 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -75,4 +75,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 891e5cc52b45..e6de9638e560 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -303,3 +303,8 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d9c5979df7..c6bbd8ffd8c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12588,6 +12588,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * KVM doesn't support moving memslots when there are external page
+	 * trackers attached to the VM, i.e. if KVMGT is in use.
+	 */
+	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
+		return -EINVAL;
+
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
 		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
 			return -EINVAL;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 16/28] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

When handling a slot "flush", don't call back into KVM to drop write
protection for gfns in the slot.  Now that KVM rejects attempts to move
memory slots while KVMGT is attached, the only time a slot is "flushed"
is when it's being removed, i.e. the memslot and all its write-tracking
metadata is about to be deleted.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index aaebb44c139f..c309a77f495f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1629,14 +1629,8 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn)) {
-			write_lock(&kvm->mmu_lock);
-			kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						KVM_PAGE_TRACK_WRITE);
-			write_unlock(&kvm->mmu_lock);
-
+		if (kvmgt_gfn_is_write_protected(info, gfn))
 			kvmgt_protect_table_del(info, gfn);
-		}
 	}
 	mutex_unlock(&info->vgpu_lock);
 }
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 16/28] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When handling a slot "flush", don't call back into KVM to drop write
protection for gfns in the slot.  Now that KVM rejects attempts to move
memory slots while KVMGT is attached, the only time a slot is "flushed"
is when it's being removed, i.e. the memslot and all its write-tracking
metadata is about to be deleted.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index aaebb44c139f..c309a77f495f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1629,14 +1629,8 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn)) {
-			write_lock(&kvm->mmu_lock);
-			kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						KVM_PAGE_TRACK_WRITE);
-			write_unlock(&kvm->mmu_lock);
-
+		if (kvmgt_gfn_is_write_protected(info, gfn))
 			kvmgt_protect_table_del(info, gfn);
-		}
 	}
 	mutex_unlock(&info->vgpu_lock);
 }
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 17/28] KVM: x86: Add a new page-track hook to handle memslot deletion
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

From: Yan Zhao <yan.y.zhao@intel.com>

Add a new page-track hook, track_remove_region(), that is called when a
memslot DELETE operation is about to be committed.  The "remove" hook
will be used by KVMGT and will effectively replace the existing
track_flush_slot() altogether now that KVM itself doesn't rely on the
"flush" hook either.

The "flush" hook is flawed as it's invoked before the memslot operation
is guaranteed to succeed, i.e. KVM might ultimately keep the existing
memslot without notifying external page track users, a.k.a. KVMGT.  In
practice, this can't currently happen on x86, but there are no guarantees
that won't change in the future, not to mention that "flush" does a very
poor job of describing what is happening.

Pass in the gfn+nr_pages instead of the slot itself so external users,
i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
help set the stage for additional cleanups to the page-track APIs.

Opportunistically align the existing srcu_read_lock_held() usage so that
the new case doesn't stand out like a sore thumb (and not aligning the
new code makes bots unhappy).

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 12 ++++++++++++
 arch/x86/kvm/mmu/page_track.c         | 27 +++++++++++++++++++++++++--
 arch/x86/kvm/x86.c                    |  3 +++
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f744682648e7..cfd36c22b467 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,6 +43,17 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    struct kvm_page_track_notifier_node *node);
+
+	/*
+	 * Invoked when a memory region is removed from the guest.  Or in KVM
+	 * terms, when a memslot is deleted.
+	 *
+	 * @gfn:       base gfn of the region being removed
+	 * @nr_pages:  number of pages in the to-be-removed region
+	 * @node:      this node
+	 */
+	void (*track_remove_region)(gfn_t gfn, unsigned long nr_pages,
+				    struct kvm_page_track_notifier_node *node);
 };
 
 int kvm_page_track_init(struct kvm *kvm);
@@ -75,6 +86,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index e6de9638e560..d971c28be99d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -270,7 +270,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 
 	idx = srcu_read_lock(&head->track_srcu);
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
+				  srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
@@ -298,12 +298,35 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 	idx = srcu_read_lock(&head->track_srcu);
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
+				  srcu_read_lock_held(&head->track_srcu))
 		if (n->track_flush_slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * Notify external page track nodes that a memory region is being removed from
+ * the VM, e.g. so that users can free any associated metadata.
+ */
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
+				  srcu_read_lock_held(&head->track_srcu))
+		if (n->track_remove_region)
+			n->track_remove_region(slot->base_gfn, slot->npages, n);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
+
 bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c6bbd8ffd8c8..6ae2b612f8c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12749,6 +12749,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
+	if (change == KVM_MR_DELETE)
+		kvm_page_track_delete_slot(kvm, old);
+
 	if (!kvm->arch.n_requested_mmu_pages &&
 	    (change == KVM_MR_CREATE || change == KVM_MR_DELETE)) {
 		unsigned long nr_mmu_pages;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 17/28] KVM: x86: Add a new page-track hook to handle memslot deletion
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Add a new page-track hook, track_remove_region(), that is called when a
memslot DELETE operation is about to be committed.  The "remove" hook
will be used by KVMGT and will effectively replace the existing
track_flush_slot() altogether now that KVM itself doesn't rely on the
"flush" hook either.

The "flush" hook is flawed as it's invoked before the memslot operation
is guaranteed to succeed, i.e. KVM might ultimately keep the existing
memslot without notifying external page track users, a.k.a. KVMGT.  In
practice, this can't currently happen on x86, but there are no guarantees
that won't change in the future, not to mention that "flush" does a very
poor job of describing what is happening.

Pass in the gfn+nr_pages instead of the slot itself so external users,
i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
help set the stage for additional cleanups to the page-track APIs.

Opportunistically align the existing srcu_read_lock_held() usage so that
the new case doesn't stand out like a sore thumb (and not aligning the
new code makes bots unhappy).

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 12 ++++++++++++
 arch/x86/kvm/mmu/page_track.c         | 27 +++++++++++++++++++++++++--
 arch/x86/kvm/x86.c                    |  3 +++
 3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f744682648e7..cfd36c22b467 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,6 +43,17 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    struct kvm_page_track_notifier_node *node);
+
+	/*
+	 * Invoked when a memory region is removed from the guest.  Or in KVM
+	 * terms, when a memslot is deleted.
+	 *
+	 * @gfn:       base gfn of the region being removed
+	 * @nr_pages:  number of pages in the to-be-removed region
+	 * @node:      this node
+	 */
+	void (*track_remove_region)(gfn_t gfn, unsigned long nr_pages,
+				    struct kvm_page_track_notifier_node *node);
 };
 
 int kvm_page_track_init(struct kvm *kvm);
@@ -75,6 +86,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index e6de9638e560..d971c28be99d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -270,7 +270,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 
 	idx = srcu_read_lock(&head->track_srcu);
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
+				  srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
@@ -298,12 +298,35 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 
 	idx = srcu_read_lock(&head->track_srcu);
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
+				  srcu_read_lock_held(&head->track_srcu))
 		if (n->track_flush_slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * Notify external page track nodes that a memory region is being removed from
+ * the VM, e.g. so that users can free any associated metadata.
+ */
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
+				  srcu_read_lock_held(&head->track_srcu))
+		if (n->track_remove_region)
+			n->track_remove_region(slot->base_gfn, slot->npages, n);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
+
 bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c6bbd8ffd8c8..6ae2b612f8c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12749,6 +12749,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
+	if (change == KVM_MR_DELETE)
+		kvm_page_track_delete_slot(kvm, old);
+
 	if (!kvm->arch.n_requested_mmu_pages &&
 	    (change == KVM_MR_CREATE || change == KVM_MR_DELETE)) {
 		unsigned long nr_mmu_pages;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 18/28] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

From: Yan Zhao <yan.y.zhao@intel.com>

Switch from the poorly named and flawed ->track_flush_slot() to the newly
introduced ->track_remove_region().  From KVMGT's perspective, the two
hooks are functionally equivalent, the only difference being that
->track_remove_region() is called only when KVM is 100% certain the
memory region will be removed, i.e. is invoked slightly later in KVM's
memslot modification flow.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: handle name change, massage changelog, rebase]
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index c309a77f495f..2e65901270ca 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -108,9 +108,8 @@ struct gvt_dma {
 
 static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 				   struct kvm_page_track_notifier_node *node);
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node);
 
 static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
 {
@@ -665,7 +664,7 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
-	vgpu->track_node.track_flush_slot = kvmgt_page_track_flush_slot;
+	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
 	kvm_get_kvm(vgpu->vfio_device.kvm);
 	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
 					 &vgpu->track_node);
@@ -1616,22 +1615,20 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 	mutex_unlock(&info->vgpu_lock);
 }
 
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node)
 {
 	unsigned long i;
-	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
 	mutex_lock(&info->vgpu_lock);
 
-	for (i = 0; i < slot->npages; i++) {
-		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn))
-			kvmgt_protect_table_del(info, gfn);
+	for (i = 0; i < nr_pages; i++) {
+		if (kvmgt_gfn_is_write_protected(info, gfn + i))
+			kvmgt_protect_table_del(info, gfn + i);
 	}
+
 	mutex_unlock(&info->vgpu_lock);
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 18/28] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Switch from the poorly named and flawed ->track_flush_slot() to the newly
introduced ->track_remove_region().  From KVMGT's perspective, the two
hooks are functionally equivalent, the only difference being that
->track_remove_region() is called only when KVM is 100% certain the
memory region will be removed, i.e. is invoked slightly later in KVM's
memslot modification flow.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: handle name change, massage changelog, rebase]
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index c309a77f495f..2e65901270ca 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -108,9 +108,8 @@ struct gvt_dma {
 
 static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 				   struct kvm_page_track_notifier_node *node);
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node);
 
 static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
 {
@@ -665,7 +664,7 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
-	vgpu->track_node.track_flush_slot = kvmgt_page_track_flush_slot;
+	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
 	kvm_get_kvm(vgpu->vfio_device.kvm);
 	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
 					 &vgpu->track_node);
@@ -1616,22 +1615,20 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 	mutex_unlock(&info->vgpu_lock);
 }
 
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node)
 {
 	unsigned long i;
-	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
 	mutex_lock(&info->vgpu_lock);
 
-	for (i = 0; i < slot->npages; i++) {
-		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn))
-			kvmgt_protect_table_del(info, gfn);
+	for (i = 0; i < nr_pages; i++) {
+		if (kvmgt_gfn_is_write_protected(info, gfn + i))
+			kvmgt_protect_table_del(info, gfn + i);
 	}
+
 	mutex_unlock(&info->vgpu_lock);
 }
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 19/28] KVM: x86: Remove the unused page-track hook track_flush_slot()
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

From: Yan Zhao <yan.y.zhao@intel.com>

Remove ->track_remove_slot(), there are no longer any users and it's
unlikely a "flush" hook will ever be the correct API to provide to an
external page-track user.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 -----------
 arch/x86/kvm/mmu/mmu.c                |  2 --
 arch/x86/kvm/mmu/page_track.c         | 26 --------------------------
 3 files changed, 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index cfd36c22b467..5c348ffdc194 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -33,16 +33,6 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
 			    struct kvm_page_track_notifier_node *node);
-	/*
-	 * It is called when memory slot is being moved or removed
-	 * users can drop write-protection for the pages in that memory slot
-	 *
-	 * @kvm: the kvm where memory slot being moved or removed
-	 * @slot: the memory slot being moved or removed
-	 * @node: this node
-	 */
-	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    struct kvm_page_track_notifier_node *node);
 
 	/*
 	 * Invoked when a memory region is removed from the guest.  Or in KVM
@@ -85,7 +75,6 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index af3e562d3106..3f9030650c3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6734,8 +6734,6 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
 	kvm_mmu_zap_all_fast(kvm);
-
-	kvm_page_track_flush_slot(kvm, slot);
 }
 
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d971c28be99d..2a6ab7c455c0 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -278,32 +278,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
-/*
- * Notify the node that memory slot is being removed or moved so that it can
- * drop write-protection for the pages in the memory slot.
- *
- * The node should figure out it has any write-protected pages in this slot
- * by itself.
- */
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
-	struct kvm_page_track_notifier_head *head;
-	struct kvm_page_track_notifier_node *n;
-	int idx;
-
-	head = &kvm->arch.track_notifier_head;
-
-	if (hlist_empty(&head->track_notifier_list))
-		return;
-
-	idx = srcu_read_lock(&head->track_srcu);
-	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				  srcu_read_lock_held(&head->track_srcu))
-		if (n->track_flush_slot)
-			n->track_flush_slot(kvm, slot, n);
-	srcu_read_unlock(&head->track_srcu, idx);
-}
-
 /*
  * Notify external page track nodes that a memory region is being removed from
  * the VM, e.g. so that users can free any associated metadata.
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 19/28] KVM: x86: Remove the unused page-track hook track_flush_slot()
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Remove ->track_remove_slot(), there are no longer any users and it's
unlikely a "flush" hook will ever be the correct API to provide to an
external page-track user.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 -----------
 arch/x86/kvm/mmu/mmu.c                |  2 --
 arch/x86/kvm/mmu/page_track.c         | 26 --------------------------
 3 files changed, 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index cfd36c22b467..5c348ffdc194 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -33,16 +33,6 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
 			    struct kvm_page_track_notifier_node *node);
-	/*
-	 * It is called when memory slot is being moved or removed
-	 * users can drop write-protection for the pages in that memory slot
-	 *
-	 * @kvm: the kvm where memory slot being moved or removed
-	 * @slot: the memory slot being moved or removed
-	 * @node: this node
-	 */
-	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    struct kvm_page_track_notifier_node *node);
 
 	/*
 	 * Invoked when a memory region is removed from the guest.  Or in KVM
@@ -85,7 +75,6 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index af3e562d3106..3f9030650c3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6734,8 +6734,6 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
 	kvm_mmu_zap_all_fast(kvm);
-
-	kvm_page_track_flush_slot(kvm, slot);
 }
 
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d971c28be99d..2a6ab7c455c0 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -278,32 +278,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
-/*
- * Notify the node that memory slot is being removed or moved so that it can
- * drop write-protection for the pages in the memory slot.
- *
- * The node should figure out it has any write-protected pages in this slot
- * by itself.
- */
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
-	struct kvm_page_track_notifier_head *head;
-	struct kvm_page_track_notifier_node *n;
-	int idx;
-
-	head = &kvm->arch.track_notifier_head;
-
-	if (hlist_empty(&head->track_notifier_list))
-		return;
-
-	idx = srcu_read_lock(&head->track_srcu);
-	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				  srcu_read_lock_held(&head->track_srcu))
-		if (n->track_flush_slot)
-			n->track_flush_slot(kvm, slot, n);
-	srcu_read_unlock(&head->track_srcu, idx);
-}
-
 /*
  * Notify external page track nodes that a memory region is being removed from
  * the VM, e.g. so that users can free any associated metadata.
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 20/28] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Bury the declaration of the page-track helpers that are intended only for
internal KVM use in a "private" header.  In addition to guarding against
unwanted usage of the internal-only helpers, dropping their definitions
avoids exposing other structures that should be KVM-internal, e.g. for
memslots.  This is a baby step toward making kvm_host.h a KVM-internal
header in the very distant future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 21 ++---------------
 arch/x86/kvm/mmu/mmu.c                |  3 ++-
 arch/x86/kvm/mmu/page_track.c         |  8 +------
 arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  1 +
 5 files changed, 39 insertions(+), 27 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 5c348ffdc194..76c0070dfe2a 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_KVM_PAGE_TRACK_H
 #define _ASM_X86_KVM_PAGE_TRACK_H
 
+#include <linux/kvm_types.h>
+
 enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_WRITE,
 	KVM_PAGE_TRACK_MAX,
@@ -46,26 +48,12 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
-
-bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
-
-void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
-int kvm_page_track_create_memslot(struct kvm *kvm,
-				  struct kvm_memory_slot *slot,
-				  unsigned long npages);
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
 				  struct kvm_memory_slot *slot, gfn_t gfn,
 				  enum kvm_page_track_mode mode);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
 
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
@@ -73,10 +61,5 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
-void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm);
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3f9030650c3d..0d9fe54ecb01 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -25,6 +25,7 @@
 #include "kvm_cache_regs.h"
 #include "smm.h"
 #include "kvm_emulate.h"
+#include "page_track.h"
 #include "cpuid.h"
 #include "spte.h"
 
@@ -53,7 +54,7 @@
 #include <asm/io.h>
 #include <asm/set_memory.h>
 #include <asm/vmx.h>
-#include <asm/kvm_page_track.h>
+
 #include "trace.h"
 
 extern bool itlb_multihit_kvm_mitigation;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2a6ab7c455c0..e15329d48f95 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -15,10 +15,9 @@
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
-#include <asm/kvm_page_track.h>
-
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "page_track.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -300,8 +299,3 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_remove_region(slot->base_gfn, slot->npages, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
-
-bool kvm_page_track_has_external_user(struct kvm *kvm)
-{
-	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
-}
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
new file mode 100644
index 000000000000..89712f123ad3
--- /dev/null
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_PAGE_TRACK_H
+#define __KVM_X86_PAGE_TRACK_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_page_track.h>
+
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
+int kvm_page_track_create_memslot(struct kvm *kvm,
+				  struct kvm_memory_slot *slot,
+				  unsigned long npages);
+
+bool kvm_slot_page_track_is_active(struct kvm *kvm,
+				   const struct kvm_memory_slot *slot,
+				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
+
+#endif /* __KVM_X86_PAGE_TRACK_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6ae2b612f8c6..5b689b144516 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -25,6 +25,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
+#include "mmu/page_track.h"
 #include "x86.h"
 #include "cpuid.h"
 #include "pmu.h"
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 20/28] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bury the declaration of the page-track helpers that are intended only for
internal KVM use in a "private" header.  In addition to guarding against
unwanted usage of the internal-only helpers, dropping their definitions
avoids exposing other structures that should be KVM-internal, e.g. for
memslots.  This is a baby step toward making kvm_host.h a KVM-internal
header in the very distant future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 21 ++---------------
 arch/x86/kvm/mmu/mmu.c                |  3 ++-
 arch/x86/kvm/mmu/page_track.c         |  8 +------
 arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  1 +
 5 files changed, 39 insertions(+), 27 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 5c348ffdc194..76c0070dfe2a 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_KVM_PAGE_TRACK_H
 #define _ASM_X86_KVM_PAGE_TRACK_H
 
+#include <linux/kvm_types.h>
+
 enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_WRITE,
 	KVM_PAGE_TRACK_MAX,
@@ -46,26 +48,12 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
-
-bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
-
-void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
-int kvm_page_track_create_memslot(struct kvm *kvm,
-				  struct kvm_memory_slot *slot,
-				  unsigned long npages);
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
 				  struct kvm_memory_slot *slot, gfn_t gfn,
 				  enum kvm_page_track_mode mode);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
 
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
@@ -73,10 +61,5 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
-void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm);
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3f9030650c3d..0d9fe54ecb01 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -25,6 +25,7 @@
 #include "kvm_cache_regs.h"
 #include "smm.h"
 #include "kvm_emulate.h"
+#include "page_track.h"
 #include "cpuid.h"
 #include "spte.h"
 
@@ -53,7 +54,7 @@
 #include <asm/io.h>
 #include <asm/set_memory.h>
 #include <asm/vmx.h>
-#include <asm/kvm_page_track.h>
+
 #include "trace.h"
 
 extern bool itlb_multihit_kvm_mitigation;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2a6ab7c455c0..e15329d48f95 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -15,10 +15,9 @@
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
-#include <asm/kvm_page_track.h>
-
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "page_track.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -300,8 +299,3 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_remove_region(slot->base_gfn, slot->npages, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
-
-bool kvm_page_track_has_external_user(struct kvm *kvm)
-{
-	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
-}
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
new file mode 100644
index 000000000000..89712f123ad3
--- /dev/null
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_PAGE_TRACK_H
+#define __KVM_X86_PAGE_TRACK_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_page_track.h>
+
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
+int kvm_page_track_create_memslot(struct kvm *kvm,
+				  struct kvm_memory_slot *slot,
+				  unsigned long npages);
+
+bool kvm_slot_page_track_is_active(struct kvm *kvm,
+				   const struct kvm_memory_slot *slot,
+				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
+
+#endif /* __KVM_X86_PAGE_TRACK_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6ae2b612f8c6..5b689b144516 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -25,6 +25,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
+#include "mmu/page_track.h"
 #include "x86.h"
 #include "cpuid.h"
 #include "pmu.h"
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 21/28] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Disable the page-track notifier code at compile time if there are no
external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
now hooks emulated writes directly instead of relying on the page-track
mechanism.

Provide a stub for "struct kvm_page_track_notifier_node" so that including
headers directly from the command line, e.g. for testing include guards,
doesn't fail due to a struct having an incomplete type.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/asm/kvm_page_track.h | 22 +++++++++++++-------
 arch/x86/kvm/mmu/page_track.c         | 10 ++++-----
 arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
 4 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 113598d3e886..5ce06a75d3de 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1247,7 +1247,9 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 	struct kvm_page_track_notifier_head track_notifier_head;
+#endif
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
 	 * faults only take mmu_lock for read.  For simplicity, the unsync
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 76c0070dfe2a..61adb07b5927 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -9,6 +9,14 @@ enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_MAX,
 };
 
+void kvm_slot_page_track_add_page(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn,
+				  enum kvm_page_track_mode mode);
+void kvm_slot_page_track_remove_page(struct kvm *kvm,
+				     struct kvm_memory_slot *slot, gfn_t gfn,
+				     enum kvm_page_track_mode mode);
+
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
  * the head which will be notified when guest is triggering the track event.
@@ -48,18 +56,18 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
-
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
 				 struct kvm_page_track_notifier_node *n);
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+#else
+/*
+ * Allow defining a node in a structure even if page tracking is disabled, e.g.
+ * to play nice with testing headers via direct inclusion from the command line.
+ */
+struct kvm_page_track_notifier_node {};
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index e15329d48f95..b20aad7ac3fe 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -194,6 +194,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
 }
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 void kvm_page_track_cleanup(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_head *head;
@@ -255,14 +256,13 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
  * The node should figure out if the written page is the one that node is
  * interested in by itself.
  */
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes)
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
 {
 	struct kvm_page_track_notifier_head *head;
 	struct kvm_page_track_notifier_node *n;
 	int idx;
 
-	head = &vcpu->kvm->arch.track_notifier_head;
+	head = &kvm->arch.track_notifier_head;
 
 	if (hlist_empty(&head->track_notifier_list))
 		return;
@@ -273,8 +273,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
-
-	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
@@ -299,3 +297,5 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_remove_region(slot->base_gfn, slot->npages, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+#endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 89712f123ad3..931b26b8fc8f 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -6,8 +6,6 @@
 
 #include <asm/kvm_page_track.h>
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
@@ -21,13 +19,36 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
 }
+#else
+static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
+static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
+
+static inline void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa,
+					  const u8 *new, int bytes) { }
+static inline void kvm_page_track_delete_slot(struct kvm *kvm,
+					      struct kvm_memory_slot *slot) { }
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
+
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
+
+static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+					const u8 *new, int bytes)
+{
+	__kvm_page_track_write(vcpu->kvm, gpa, new, bytes);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
+}
 
 #endif /* __KVM_X86_PAGE_TRACK_H */
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 21/28] KVM: x86/mmu: Use page-track notifiers iff there are external users
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disable the page-track notifier code at compile time if there are no
external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
now hooks emulated writes directly instead of relying on the page-track
mechanism.

Provide a stub for "struct kvm_page_track_notifier_node" so that including
headers directly from the command line, e.g. for testing include guards,
doesn't fail due to a struct having an incomplete type.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/asm/kvm_page_track.h | 22 +++++++++++++-------
 arch/x86/kvm/mmu/page_track.c         | 10 ++++-----
 arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
 4 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 113598d3e886..5ce06a75d3de 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1247,7 +1247,9 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 	struct kvm_page_track_notifier_head track_notifier_head;
+#endif
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
 	 * faults only take mmu_lock for read.  For simplicity, the unsync
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 76c0070dfe2a..61adb07b5927 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -9,6 +9,14 @@ enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_MAX,
 };
 
+void kvm_slot_page_track_add_page(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn,
+				  enum kvm_page_track_mode mode);
+void kvm_slot_page_track_remove_page(struct kvm *kvm,
+				     struct kvm_memory_slot *slot, gfn_t gfn,
+				     enum kvm_page_track_mode mode);
+
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
  * the head which will be notified when guest is triggering the track event.
@@ -48,18 +56,18 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
-
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
 				 struct kvm_page_track_notifier_node *n);
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+#else
+/*
+ * Allow defining a node in a structure even if page tracking is disabled, e.g.
+ * to play nice with testing headers via direct inclusion from the command line.
+ */
+struct kvm_page_track_notifier_node {};
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index e15329d48f95..b20aad7ac3fe 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -194,6 +194,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
 }
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 void kvm_page_track_cleanup(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_head *head;
@@ -255,14 +256,13 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
  * The node should figure out if the written page is the one that node is
  * interested in by itself.
  */
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes)
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
 {
 	struct kvm_page_track_notifier_head *head;
 	struct kvm_page_track_notifier_node *n;
 	int idx;
 
-	head = &vcpu->kvm->arch.track_notifier_head;
+	head = &kvm->arch.track_notifier_head;
 
 	if (hlist_empty(&head->track_notifier_list))
 		return;
@@ -273,8 +273,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
-
-	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
@@ -299,3 +297,5 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_remove_region(slot->base_gfn, slot->npages, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+#endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 89712f123ad3..931b26b8fc8f 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -6,8 +6,6 @@
 
 #include <asm/kvm_page_track.h>
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
@@ -21,13 +19,36 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
 }
+#else
+static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
+static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
+
+static inline void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa,
+					  const u8 *new, int bytes) { }
+static inline void kvm_page_track_delete_slot(struct kvm *kvm,
+					      struct kvm_memory_slot *slot) { }
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
+
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
+
+static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+					const u8 *new, int bytes)
+{
+	__kvm_page_track_write(vcpu->kvm, gpa, new, bytes);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
+}
 
 #endif /* __KVM_X86_PAGE_TRACK_H */
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 22/28] KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective).  In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient.  And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  12 +--
 arch/x86/include/asm/kvm_page_track.h |  11 +--
 arch/x86/kvm/mmu/mmu.c                |  14 ++--
 arch/x86/kvm/mmu/page_track.c         | 111 ++++++++------------------
 arch/x86/kvm/mmu/page_track.h         |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      |   4 +-
 6 files changed, 51 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ce06a75d3de..3dde3a11113a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,13 +288,13 @@ struct kvm_kernel_irq_routing_entry;
  * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
  * also includes TDP pages) to determine whether or not a page can be used in
  * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
+ * minimize the size of kvm_memory_slot.arch.gfn_write_track, i.e. allows
+ * allocating 2 bytes per gfn instead of 4 bytes per gfn.
  *
  * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
+ * gfn_write_track.  As above, gfn_write_track is a 16 bit counter, so KVM must
+ * not create more than 2^16-1 upper-level shadow pages at a single gfn,
+ * otherwise gfn_write_track will overflow and explosions will ensue.
  *
  * A unique shadow page (SP) for a gfn is created if and only if an existing SP
  * cannot be reused.  The ability to reuse a SP is tracked by its role, which
@@ -1005,7 +1005,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
-	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
+	unsigned short *gfn_write_track;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 61adb07b5927..9e4ee26d1779 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,17 +4,10 @@
 
 #include <linux/kvm_types.h>
 
-enum kvm_page_track_mode {
-	KVM_PAGE_TRACK_WRITE,
-	KVM_PAGE_TRACK_MAX,
-};
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
+				  struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
+				     struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d9fe54ecb01..8041f5747704 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,8 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn,
-						    KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_add_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -884,8 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						       KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2825,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4226,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
@@ -5461,8 +5459,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * physical address properties) in a single VM would require tracking
 	 * all relevant CPUID information in kvm_mmu_page_role. That is very
 	 * undesirable as it would increase the memory requirements for
-	 * gfn_track (see struct kvm_mmu_page_role comments).  For now that
-	 * problem is swept under the rug; KVM's CPUID API is horrific and
+	 * gfn_write_track (see struct kvm_mmu_page_role comments).  For now
+	 * that problem is swept under the rug; KVM's CPUID API is horrific and
 	 * it's all but impossible to solve it without introducing a new API.
 	 */
 	vcpu->arch.root_mmu.root_role.word = 0;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index b20aad7ac3fe..cdc6069b8caf 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -27,76 +27,50 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
-	int i;
+	kvfree(slot->arch.gfn_write_track);
+	slot->arch.gfn_write_track = NULL;
+}
 
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		kvfree(slot->arch.gfn_track[i]);
-		slot->arch.gfn_track[i] = NULL;
-	}
+static int __kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot,
+						 unsigned long npages)
+{
+	const size_t size = sizeof(*slot->arch.gfn_write_track);
+
+	if (!slot->arch.gfn_write_track)
+		slot->arch.gfn_write_track = __vcalloc(npages, size,
+						       GFP_KERNEL_ACCOUNT);
+
+	return slot->arch.gfn_write_track ? 0 : -ENOMEM;
 }
 
 int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages)
 {
-	int i;
-
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		if (i == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm))
-			continue;
-
-		slot->arch.gfn_track[i] =
-			__vcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
-				  GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.gfn_track[i])
-			goto track_free;
-	}
-
-	return 0;
-
-track_free:
-	kvm_page_track_free_memslot(slot);
-	return -ENOMEM;
-}
-
-static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
-{
-	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
-		return false;
-
-	return true;
-}
-
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
-{
-	unsigned short *gfn_track;
-
-	if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE])
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return 0;
 
-	gfn_track = __vcalloc(slot->npages, sizeof(*gfn_track),
-			      GFP_KERNEL_ACCOUNT);
-	if (gfn_track == NULL)
-		return -ENOMEM;
+	return __kvm_page_track_write_tracking_alloc(slot, npages);
+}
 
-	slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track;
-	return 0;
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
+{
+	return __kvm_page_track_write_tracking_alloc(slot, slot->npages);
 }
 
-static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
-			     enum kvm_page_track_mode mode, short count)
+static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
+				   short count)
 {
 	int index, val;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
 
-	val = slot->arch.gfn_track[mode][index];
+	val = slot->arch.gfn_write_track[index];
 
 	if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
 		return;
 
-	slot->arch.gfn_track[mode][index] += count;
+	slot->arch.gfn_write_track[index] += count;
 }
 
 /*
@@ -109,21 +83,15 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode)
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, 1);
+	update_gfn_write_track(slot, gfn, 1);
 
 	/*
 	 * new track stops large page mapping for the
@@ -131,9 +99,8 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
-	if (mode == KVM_PAGE_TRACK_WRITE)
-		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-			kvm_flush_remote_tlbs(kvm);
+	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+		kvm_flush_remote_tlbs(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
 
@@ -148,20 +115,14 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode)
+				     struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, -1);
+	update_gfn_write_track(slot, gfn, -1);
 
 	/*
 	 * allow large page mapping for the tracked page
@@ -176,22 +137,18 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
  */
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode)
+				   gfn_t gfn)
 {
 	int index;
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
-		return false;
-
 	if (!slot)
 		return false;
 
-	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return false;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	return !!READ_ONCE(slot->arch.gfn_write_track[index]);
 }
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 931b26b8fc8f..789d0c479519 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -16,8 +16,7 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  unsigned long npages);
 
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+				   const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 2e65901270ca..25226e4e3417 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_add_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_remove_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 22/28] KVM: x86/mmu: Drop infrastructure for multiple page-track modes
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective).  In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient.  And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  12 +--
 arch/x86/include/asm/kvm_page_track.h |  11 +--
 arch/x86/kvm/mmu/mmu.c                |  14 ++--
 arch/x86/kvm/mmu/page_track.c         | 111 ++++++++------------------
 arch/x86/kvm/mmu/page_track.h         |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      |   4 +-
 6 files changed, 51 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ce06a75d3de..3dde3a11113a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,13 +288,13 @@ struct kvm_kernel_irq_routing_entry;
  * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
  * also includes TDP pages) to determine whether or not a page can be used in
  * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
+ * minimize the size of kvm_memory_slot.arch.gfn_write_track, i.e. allows
+ * allocating 2 bytes per gfn instead of 4 bytes per gfn.
  *
  * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
+ * gfn_write_track.  As above, gfn_write_track is a 16 bit counter, so KVM must
+ * not create more than 2^16-1 upper-level shadow pages at a single gfn,
+ * otherwise gfn_write_track will overflow and explosions will ensue.
  *
  * A unique shadow page (SP) for a gfn is created if and only if an existing SP
  * cannot be reused.  The ability to reuse a SP is tracked by its role, which
@@ -1005,7 +1005,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
-	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
+	unsigned short *gfn_write_track;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 61adb07b5927..9e4ee26d1779 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,17 +4,10 @@
 
 #include <linux/kvm_types.h>
 
-enum kvm_page_track_mode {
-	KVM_PAGE_TRACK_WRITE,
-	KVM_PAGE_TRACK_MAX,
-};
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
+				  struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
+				     struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d9fe54ecb01..8041f5747704 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,8 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn,
-						    KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_add_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -884,8 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						       KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2825,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4226,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
@@ -5461,8 +5459,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * physical address properties) in a single VM would require tracking
 	 * all relevant CPUID information in kvm_mmu_page_role. That is very
 	 * undesirable as it would increase the memory requirements for
-	 * gfn_track (see struct kvm_mmu_page_role comments).  For now that
-	 * problem is swept under the rug; KVM's CPUID API is horrific and
+	 * gfn_write_track (see struct kvm_mmu_page_role comments).  For now
+	 * that problem is swept under the rug; KVM's CPUID API is horrific and
 	 * it's all but impossible to solve it without introducing a new API.
 	 */
 	vcpu->arch.root_mmu.root_role.word = 0;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index b20aad7ac3fe..cdc6069b8caf 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -27,76 +27,50 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
-	int i;
+	kvfree(slot->arch.gfn_write_track);
+	slot->arch.gfn_write_track = NULL;
+}
 
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		kvfree(slot->arch.gfn_track[i]);
-		slot->arch.gfn_track[i] = NULL;
-	}
+static int __kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot,
+						 unsigned long npages)
+{
+	const size_t size = sizeof(*slot->arch.gfn_write_track);
+
+	if (!slot->arch.gfn_write_track)
+		slot->arch.gfn_write_track = __vcalloc(npages, size,
+						       GFP_KERNEL_ACCOUNT);
+
+	return slot->arch.gfn_write_track ? 0 : -ENOMEM;
 }
 
 int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages)
 {
-	int i;
-
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		if (i == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm))
-			continue;
-
-		slot->arch.gfn_track[i] =
-			__vcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
-				  GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.gfn_track[i])
-			goto track_free;
-	}
-
-	return 0;
-
-track_free:
-	kvm_page_track_free_memslot(slot);
-	return -ENOMEM;
-}
-
-static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
-{
-	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
-		return false;
-
-	return true;
-}
-
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
-{
-	unsigned short *gfn_track;
-
-	if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE])
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return 0;
 
-	gfn_track = __vcalloc(slot->npages, sizeof(*gfn_track),
-			      GFP_KERNEL_ACCOUNT);
-	if (gfn_track == NULL)
-		return -ENOMEM;
+	return __kvm_page_track_write_tracking_alloc(slot, npages);
+}
 
-	slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track;
-	return 0;
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
+{
+	return __kvm_page_track_write_tracking_alloc(slot, slot->npages);
 }
 
-static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
-			     enum kvm_page_track_mode mode, short count)
+static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
+				   short count)
 {
 	int index, val;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
 
-	val = slot->arch.gfn_track[mode][index];
+	val = slot->arch.gfn_write_track[index];
 
 	if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
 		return;
 
-	slot->arch.gfn_track[mode][index] += count;
+	slot->arch.gfn_write_track[index] += count;
 }
 
 /*
@@ -109,21 +83,15 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode)
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, 1);
+	update_gfn_write_track(slot, gfn, 1);
 
 	/*
 	 * new track stops large page mapping for the
@@ -131,9 +99,8 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
-	if (mode == KVM_PAGE_TRACK_WRITE)
-		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-			kvm_flush_remote_tlbs(kvm);
+	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+		kvm_flush_remote_tlbs(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
 
@@ -148,20 +115,14 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode)
+				     struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, -1);
+	update_gfn_write_track(slot, gfn, -1);
 
 	/*
 	 * allow large page mapping for the tracked page
@@ -176,22 +137,18 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
  */
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode)
+				   gfn_t gfn)
 {
 	int index;
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
-		return false;
-
 	if (!slot)
 		return false;
 
-	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return false;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	return !!READ_ONCE(slot->arch.gfn_write_track[index]);
 }
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 931b26b8fc8f..789d0c479519 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -16,8 +16,7 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  unsigned long npages);
 
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+				   const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 2e65901270ca..25226e4e3417 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_add_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_remove_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 23/28] KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 ++++----
 arch/x86/kvm/mmu/mmu.c                |  8 ++++----
 arch/x86/kvm/mmu/page_track.c         | 21 +++++++++------------
 arch/x86/kvm/mmu/page_track.h         |  4 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c      |  4 ++--
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 9e4ee26d1779..f5c1db36cdb7 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,10 +4,10 @@
 
 #include <linux/kvm_types.h>
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_add_gfn(struct kvm *kvm,
+			     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8041f5747704..1818c047891f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn);
+		return kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
+		return kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2823,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
+	if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4224,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
+	if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index cdc6069b8caf..b835ba7f325c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -84,10 +84,9 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn)
 {
-
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -102,12 +101,11 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
 /*
  * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page. It is the opposed operation of
- * kvm_slot_page_track_add_page().
+ * of corresponding access on that page.
  *
  * It should be called under the protection both of mmu-lock and kvm->srcu
  * or kvm->slots_lock.
@@ -116,8 +114,8 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_remove_gfn(struct kvm *kvm,
+				struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
@@ -130,14 +128,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn)
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	int index;
 
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 789d0c479519..50d3278e8c69 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,8 +15,8 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 25226e4e3417..18f04493e103 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn);
+	kvm_write_track_add_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn);
+	kvm_write_track_remove_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 23/28] KVM: x86/mmu: Rename page-track APIs to reflect the new reality
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 ++++----
 arch/x86/kvm/mmu/mmu.c                |  8 ++++----
 arch/x86/kvm/mmu/page_track.c         | 21 +++++++++------------
 arch/x86/kvm/mmu/page_track.h         |  4 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c      |  4 ++--
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 9e4ee26d1779..f5c1db36cdb7 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,10 +4,10 @@
 
 #include <linux/kvm_types.h>
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_add_gfn(struct kvm *kvm,
+			     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8041f5747704..1818c047891f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn);
+		return kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
+		return kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2823,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
+	if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4224,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
+	if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index cdc6069b8caf..b835ba7f325c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -84,10 +84,9 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn)
 {
-
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -102,12 +101,11 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
 /*
  * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page. It is the opposed operation of
- * kvm_slot_page_track_add_page().
+ * of corresponding access on that page.
  *
  * It should be called under the protection both of mmu-lock and kvm->srcu
  * or kvm->slots_lock.
@@ -116,8 +114,8 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_remove_gfn(struct kvm *kvm,
+				struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
@@ -130,14 +128,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn)
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	int index;
 
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 789d0c479519..50d3278e8c69 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,8 +15,8 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 25226e4e3417..18f04493e103 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn);
+	kvm_write_track_add_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn);
+	kvm_write_track_remove_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 24/28] KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

When adding/removing gfns to/from write-tracking, assert that mmu_lock
is held for write, and that either slots_lock or kvm->srcu is held.
mmu_lock must be held for write to protect gfn_write_track's refcount,
and SRCU or slots_lock must be held to protect the memslot itself.

Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index b835ba7f325c..29ae61f1e303 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -12,6 +12,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/lockdep.h>
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
@@ -77,9 +78,6 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * add guest page to the tracking pool so that corresponding access on that
  * page will be intercepted.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -87,6 +85,11 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			     gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -107,9 +110,6 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
  * remove the guest page from the tracking pool which stops the interception
  * of corresponding access on that page.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -117,6 +117,11 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 void kvm_write_track_remove_gfn(struct kvm *kvm,
 				struct kvm_memory_slot *slot, gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 24/28] KVM: x86/mmu: Assert that correct locks are held for page write-tracking
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When adding/removing gfns to/from write-tracking, assert that mmu_lock
is held for write, and that either slots_lock or kvm->srcu is held.
mmu_lock must be held for write to protect gfn_write_track's refcount,
and SRCU or slots_lock must be held to protect the memslot itself.

Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index b835ba7f325c..29ae61f1e303 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -12,6 +12,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/lockdep.h>
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
@@ -77,9 +78,6 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * add guest page to the tracking pool so that corresponding access on that
  * page will be intercepted.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -87,6 +85,11 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			     gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -107,9 +110,6 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
  * remove the guest page from the tracking pool which stops the interception
  * of corresponding access on that page.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -117,6 +117,11 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 void kvm_write_track_remove_gfn(struct kvm *kvm,
 				struct kvm_memory_slot *slot, gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 25/28] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Bug the VM if something attempts to write-track a gfn, but write-tracking
isn't enabled.  The VM is doomed (and KVM has an egregious bug) if KVM or
KVMGT wants to shadow guest page tables but can't because write-tracking
isn't enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 29ae61f1e303..eedb5889d73e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -90,7 +90,7 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, 1);
@@ -122,7 +122,7 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, -1);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 25/28] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bug the VM if something attempts to write-track a gfn, but write-tracking
isn't enabled.  The VM is doomed (and KVM has an egregious bug) if KVM or
KVMGT wants to shadow guest page tables but can't because write-tracking
isn't enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 29ae61f1e303..eedb5889d73e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -90,7 +90,7 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, 1);
@@ -122,7 +122,7 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, -1);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 26/28] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  7 +--
 arch/x86/kvm/mmu/mmu.c                |  4 +-
 arch/x86/kvm/mmu/page_track.c         | 85 ++++++++++++++++++++-------
 arch/x86/kvm/mmu/page_track.h         |  5 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 37 +++---------
 5 files changed, 80 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f5c1db36cdb7..4afab697e21c 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,11 +4,6 @@
 
 #include <linux/kvm_types.h>
 
-void kvm_write_track_add_gfn(struct kvm *kvm,
-			     struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-				gfn_t gfn);
-
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
@@ -55,6 +50,8 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #else
 /*
  * Allow defining a node in a structure even if page tracking is disabled, e.g.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1818c047891f..22f13963c320 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_add_gfn(kvm, slot, gfn);
+		return __kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_remove_gfn(kvm, slot, gfn);
+		return __kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index eedb5889d73e..2a64df38ccab 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -74,16 +74,8 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 	slot->arch.gfn_write_track[index] += count;
 }
 
-/*
- * add guest page to the tracking pool so that corresponding access on that
- * page will be intercepted.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-			     gfn_t gfn)
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -104,18 +96,9 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
-/*
- * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_remove_gfn(struct kvm *kvm,
-				struct kvm_memory_slot *slot, gfn_t gfn)
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -133,7 +116,6 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
@@ -257,4 +239,63 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_add_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_remove_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 50d3278e8c69..62f98c6c5af3 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,6 +15,11 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn);
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn);
+
 bool kvm_gfn_is_write_tracked(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 18f04493e103..b995d75a19c3 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1545,9 +1545,7 @@ static struct mdev_driver intel_vgpu_mdev_driver = {
 
 int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1555,18 +1553,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_add_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_add(info, gfn);
 	return 0;
@@ -1574,9 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 
 int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1584,17 +1571,9 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_remove_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_del(info, gfn);
 	return 0;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 26/28] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  7 +--
 arch/x86/kvm/mmu/mmu.c                |  4 +-
 arch/x86/kvm/mmu/page_track.c         | 85 ++++++++++++++++++++-------
 arch/x86/kvm/mmu/page_track.h         |  5 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 37 +++---------
 5 files changed, 80 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f5c1db36cdb7..4afab697e21c 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,11 +4,6 @@
 
 #include <linux/kvm_types.h>
 
-void kvm_write_track_add_gfn(struct kvm *kvm,
-			     struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-				gfn_t gfn);
-
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
@@ -55,6 +50,8 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #else
 /*
  * Allow defining a node in a structure even if page tracking is disabled, e.g.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1818c047891f..22f13963c320 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_add_gfn(kvm, slot, gfn);
+		return __kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_remove_gfn(kvm, slot, gfn);
+		return __kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index eedb5889d73e..2a64df38ccab 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -74,16 +74,8 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 	slot->arch.gfn_write_track[index] += count;
 }
 
-/*
- * add guest page to the tracking pool so that corresponding access on that
- * page will be intercepted.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-			     gfn_t gfn)
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -104,18 +96,9 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
-/*
- * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_remove_gfn(struct kvm *kvm,
-				struct kvm_memory_slot *slot, gfn_t gfn)
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -133,7 +116,6 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
@@ -257,4 +239,63 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_add_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_remove_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 50d3278e8c69..62f98c6c5af3 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,6 +15,11 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn);
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn);
+
 bool kvm_gfn_is_write_tracked(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 18f04493e103..b995d75a19c3 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1545,9 +1545,7 @@ static struct mdev_driver intel_vgpu_mdev_driver = {
 
 int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1555,18 +1553,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_add_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_add(info, gfn);
 	return 0;
@@ -1574,9 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 
 int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1584,17 +1571,9 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_remove_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_del(info, gfn);
 	return 0;
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 27/28] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:35   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so.  Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 +++++------
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++------
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 17 +++++++----------
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 4afab697e21c..3d040741044b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -44,12 +44,11 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n);
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n);
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n);
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n);
+
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #else
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2a64df38ccab..fd04e618ad2d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -157,17 +157,22 @@ int kvm_page_track_init(struct kvm *kvm)
  * register the notifier so that event interception for the tracked guest
  * pages can be received.
  */
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n)
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
+	if (!kvm || kvm->mm != current->mm)
+		return -ESRCH;
+
+	kvm_get_kvm(kvm);
+
 	head = &kvm->arch.track_notifier_head;
 
 	write_lock(&kvm->mmu_lock);
 	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
 	write_unlock(&kvm->mmu_lock);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
 
@@ -175,9 +180,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
  * stop receiving the event interception. It is the opposed operation of
  * kvm_page_track_register_notifier().
  */
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n)
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
@@ -187,6 +191,8 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 	hlist_del_rcu(&n->node);
 	write_unlock(&kvm->mmu_lock);
 	synchronize_srcu(&head->track_srcu);
+
+	kvm_put_kvm(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b995d75a19c3..597ffc9d12fd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -653,21 +653,19 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
 static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 {
 	struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
-
-	if (!vgpu->vfio_device.kvm ||
-	    vgpu->vfio_device.kvm->mm != current->mm) {
-		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
-		return -ESRCH;
-	}
+	int ret;
 
 	if (__kvmgt_vgpu_exist(vgpu))
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
 	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
-	kvm_get_kvm(vgpu->vfio_device.kvm);
-	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
-					 &vgpu->track_node);
+	ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
+					       &vgpu->track_node);
+	if (ret) {
+		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
+		return ret;
+	}
 
 	set_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status);
 
@@ -702,7 +700,6 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
 
 	kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
 					   &vgpu->track_node);
-	kvm_put_kvm(vgpu->vfio_device.kvm);
 
 	kvmgt_protect_table_destroy(vgpu);
 	gvt_cache_destroy(vgpu);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 27/28] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
@ 2023-05-13  0:35   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:35 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so.  Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 +++++------
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++------
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 17 +++++++----------
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 4afab697e21c..3d040741044b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -44,12 +44,11 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n);
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n);
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n);
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n);
+
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #else
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2a64df38ccab..fd04e618ad2d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -157,17 +157,22 @@ int kvm_page_track_init(struct kvm *kvm)
  * register the notifier so that event interception for the tracked guest
  * pages can be received.
  */
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n)
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
+	if (!kvm || kvm->mm != current->mm)
+		return -ESRCH;
+
+	kvm_get_kvm(kvm);
+
 	head = &kvm->arch.track_notifier_head;
 
 	write_lock(&kvm->mmu_lock);
 	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
 	write_unlock(&kvm->mmu_lock);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
 
@@ -175,9 +180,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
  * stop receiving the event interception. It is the opposed operation of
  * kvm_page_track_register_notifier().
  */
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n)
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
@@ -187,6 +191,8 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 	hlist_del_rcu(&n->node);
 	write_unlock(&kvm->mmu_lock);
 	synchronize_srcu(&head->track_srcu);
+
+	kvm_put_kvm(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b995d75a19c3..597ffc9d12fd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -653,21 +653,19 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
 static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 {
 	struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
-
-	if (!vgpu->vfio_device.kvm ||
-	    vgpu->vfio_device.kvm->mm != current->mm) {
-		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
-		return -ESRCH;
-	}
+	int ret;
 
 	if (__kvmgt_vgpu_exist(vgpu))
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
 	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
-	kvm_get_kvm(vgpu->vfio_device.kvm);
-	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
-					 &vgpu->track_node);
+	ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
+					       &vgpu->track_node);
+	if (ret) {
+		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
+		return ret;
+	}
 
 	set_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status);
 
@@ -702,7 +700,6 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
 
 	kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
 					   &vgpu->track_node);
-	kvm_put_kvm(vgpu->vfio_device.kvm);
 
 	kvmgt_protect_table_destroy(vgpu);
 	gvt_cache_destroy(vgpu);
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] [PATCH v3 28/28] drm/i915/gvt: Drop final dependencies on KVM internal details
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-05-13  0:36   ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:36 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h.  KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   | 3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2d65800d8e93..53a0a42a50db 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -34,10 +34,11 @@
 #define _GVT_H_
 
 #include <uapi/linux/pci_regs.h>
-#include <linux/kvm_host.h>
 #include <linux/vfio.h>
 #include <linux/mdev.h>
 
+#include <asm/kvm_page_track.h>
+
 #include "i915_drv.h"
 #include "intel_gvt.h"
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 597ffc9d12fd..191b4484cc7e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1584,7 +1584,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 
 	mutex_lock(&info->vgpu_lock);
 
-	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
+	if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 28/28] drm/i915/gvt: Drop final dependencies on KVM internal details
@ 2023-05-13  0:36   ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-13  0:36 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h.  KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.

No functional change intended.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   | 3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2d65800d8e93..53a0a42a50db 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -34,10 +34,11 @@
 #define _GVT_H_
 
 #include <uapi/linux/pci_regs.h>
-#include <linux/kvm_host.h>
 #include <linux/vfio.h>
 #include <linux/mdev.h>
 
+#include <asm/kvm_page_track.h>
+
 #include "i915_drv.h"
 #include "intel_gvt.h"
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 597ffc9d12fd..191b4484cc7e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1584,7 +1584,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 
 	mutex_lock(&info->vgpu_lock);
 
-	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
+	if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
 
-- 
2.40.1.606.ga4b1b128d6-goog


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev8)
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
                   ` (28 preceding siblings ...)
  (?)
@ 2023-05-13  1:46 ` Patchwork
  -1 siblings, 0 replies; 86+ messages in thread
From: Patchwork @ 2023-05-13  1:46 UTC (permalink / raw)
  To: Yan Zhao; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev8)
URL   : https://patchwork.freedesktop.org/series/112196/
State : warning

== Summary ==

Error: dim checkpatch failed
7ddfeb73c37a drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
6e46a75c5c15 drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
60cd1c1b954f drm/i915/gvt: Verify hugepages are contiguous in physical address space
82759b84f7fc drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
a85e87c63800 drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
2678442df896 drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
425220f2cc46 drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
66caadcfb561 drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
cc12e36d8b5e drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
dfc1236b7e6e drm/i915/gvt: Protect gfn hash table with vgpu_lock
722d8f7ef123 KVM: x86/mmu: Move kvm_arch_flush_shadow_{all, memslot}() to mmu.c
b2fdeb2bb4b6 KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
58d52301be4c KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
84403a9c8920 KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
c5834b025c35 KVM: x86: Reject memslot MOVE operations if KVMGT is attached
0e65c8e6f583 drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
ad683a55df9c KVM: x86: Add a new page-track hook to handle memslot deletion
287f9c98484d drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
7630b3b1d29e KVM: x86: Remove the unused page-track hook track_flush_slot()
556343eb5969 KVM: x86/mmu: Move KVM-only page-track declarations to internal header
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:114: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#114: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 113 lines checked
5eeebf29aa14 KVM: x86/mmu: Use page-track notifiers iff there are external users
ec3a18744340 KVM: x86/mmu: Drop infrastructure for multiple page-track modes
1464bd80f0c0 KVM: x86/mmu: Rename page-track APIs to reflect the new reality
ccd34fefb1d2 KVM: x86/mmu: Assert that correct locks are held for page write-tracking
3601016aae9f KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
730b76683f02 KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
7fcd298749ba KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
c1cebe248055 drm/i915/gvt: Drop final dependencies on KVM internal details



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev8)
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
                   ` (29 preceding siblings ...)
  (?)
@ 2023-05-13  2:00 ` Patchwork
  -1 siblings, 0 replies; 86+ messages in thread
From: Patchwork @ 2023-05-13  2:00 UTC (permalink / raw)
  To: Yan Zhao; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8421 bytes --]

== Series Details ==

Series: drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev8)
URL   : https://patchwork.freedesktop.org/series/112196/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_13143 -> Patchwork_112196v8
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_112196v8 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_112196v8, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/index.html

Participating hosts (38 -> 36)
------------------------------

  Missing    (2): bat-rpls-2 fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_112196v8:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@dmabuf:
    - bat-adls-5:         [PASS][1] -> [DMESG-WARN][2] +19 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/bat-adls-5/igt@i915_selftest@live@dmabuf.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-adls-5/igt@i915_selftest@live@dmabuf.html

  
Known issues
------------

  Here are the changes found in Patchwork_112196v8 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_pm_backlight@basic-brightness@edp-1:
    - bat-rplp-1:         NOTRUN -> [ABORT][3] ([i915#7077])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-rplp-1/igt@i915_pm_backlight@basic-brightness@edp-1.html

  * igt@i915_selftest@live@migrate:
    - bat-dg2-11:         [PASS][4] -> [DMESG-WARN][5] ([i915#7699] / [i915#7953])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/bat-dg2-11/igt@i915_selftest@live@migrate.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-dg2-11/igt@i915_selftest@live@migrate.html

  * igt@i915_suspend@basic-s3-without-i915:
    - bat-rpls-1:         NOTRUN -> [ABORT][6] ([i915#6687] / [i915#7953] / [i915#7978])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-rpls-1/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence:
    - bat-dg2-11:         NOTRUN -> [SKIP][7] ([i915#1845] / [i915#5354]) +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-dg2-11/igt@kms_pipe_crc_basic@nonblocking-crc-frame-sequence.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-apl-guc:         [DMESG-FAIL][8] ([i915#5334]) -> [PASS][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@requests:
    - {bat-mtlp-8}:       [ABORT][10] ([i915#4983] / [i915#7920] / [i915#7953]) -> [PASS][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/bat-mtlp-8/igt@i915_selftest@live@requests.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-mtlp-8/igt@i915_selftest@live@requests.html

  * igt@i915_selftest@live@reset:
    - bat-rpls-1:         [ABORT][12] ([i915#4983] / [i915#7461] / [i915#7953] / [i915#8347] / [i915#8384]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/bat-rpls-1/igt@i915_selftest@live@reset.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-rpls-1/igt@i915_selftest@live@reset.html

  
#### Warnings ####

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-rplp-1:         [ABORT][14] ([i915#4579] / [i915#8260]) -> [SKIP][15] ([i915#3555] / [i915#4579])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13143/bat-rplp-1/igt@kms_setmode@basic-clone-single-crtc.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/bat-rplp-1/igt@kms_setmode@basic-clone-single-crtc.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#5354]: https://gitlab.freedesktop.org/drm/intel/issues/5354
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6645]: https://gitlab.freedesktop.org/drm/intel/issues/6645
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7077]: https://gitlab.freedesktop.org/drm/intel/issues/7077
  [i915#7461]: https://gitlab.freedesktop.org/drm/intel/issues/7461
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920
  [i915#7953]: https://gitlab.freedesktop.org/drm/intel/issues/7953
  [i915#7978]: https://gitlab.freedesktop.org/drm/intel/issues/7978
  [i915#8260]: https://gitlab.freedesktop.org/drm/intel/issues/8260
  [i915#8347]: https://gitlab.freedesktop.org/drm/intel/issues/8347
  [i915#8384]: https://gitlab.freedesktop.org/drm/intel/issues/8384


Build changes
-------------

  * Linux: CI_DRM_13143 -> Patchwork_112196v8

  CI-20190529: 20190529
  CI_DRM_13143: 222ff19f23b0bd6aca0b52001d69699f78f5a206 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7286: a482779488f11c432d7ddcb1980691ab1603f356 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_112196v8: 222ff19f23b0bd6aca0b52001d69699f78f5a206 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

6dd35b812a60 drm/i915/gvt: Drop final dependencies on KVM internal details
ce182f5a4e8d KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
6ef2fe25e1e7 KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
8687b24b2d31 KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
3bc5606a1530 KVM: x86/mmu: Assert that correct locks are held for page write-tracking
95cf5a24c761 KVM: x86/mmu: Rename page-track APIs to reflect the new reality
c59bc5c4faec KVM: x86/mmu: Drop infrastructure for multiple page-track modes
1ab839205738 KVM: x86/mmu: Use page-track notifiers iff there are external users
8144255d2ef9 KVM: x86/mmu: Move KVM-only page-track declarations to internal header
5f1e23aa0b37 KVM: x86: Remove the unused page-track hook track_flush_slot()
c06e42752a11 drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
f85fad4f4ebd KVM: x86: Add a new page-track hook to handle memslot deletion
17d1708c953d drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
2af557d1bb28 KVM: x86: Reject memslot MOVE operations if KVMGT is attached
f0accfb5d4ba KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
7400cc3d0cde KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
d8e835b07b1a KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
768dccb1b7d5 KVM: x86/mmu: Move kvm_arch_flush_shadow_{all, memslot}() to mmu.c
e964eda4d28f drm/i915/gvt: Protect gfn hash table with vgpu_lock
163c49208b55 drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
8a4136d492ae drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
285da8c76036 drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
4baba3d68f21 drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
ab99fe9fdb06 drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
26ed9603ea21 drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
d710d9fee0d4 drm/i915/gvt: Verify hugepages are contiguous in physical address space
d02eb2471acd drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
dc1e8a276471 drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112196v8/index.html

[-- Attachment #2: Type: text/html, Size: 9535 bytes --]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/28] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
@ 2023-05-15 11:07     ` Wang, Zhi A
  -1 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:07 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y, Ben Gardon

On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 

Acked-by: Zhi Wang <zhi.a.wang@intel.com>

This was previously to avoid stepping down to the lower level ASAP
when a guest page is not used but still stays in the GPU page table.
(mostly in windows VM, as this is a behavior of WDDM GPU MM). But
it doesn't totally solve the the trapping flood in practice. I am
totally fine that it is removed. Post shadow in the trapping
path might be extended to handle this problem.

> Currently intel_gvt_is_valid_gfn() is called in two places:
> (1) shadowing guest GGTT entry
> (2) shadowing guest PPGTT leaf entry,
> which was introduced in commit cc753fbe1ac4
> ("drm/i915/gvt: validate gfn before set shadow page entry").
> 
> However, now it's not necessary to call this interface any more, because
> a. GGTT partial write issue has been fixed by
>     commit bc0686ff5fad
>     ("drm/i915/gvt: support inconsecutive partial gtt entry write")
>     commit 510fe10b6180
>     ("drm/i915/gvt: fix a bug of partially write ggtt enties")
> b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
>     as valid page table writes. Any invalid GPA found is regarded as
>     an error, either due to guest misbehavior/attack or bug in host
>     shadow code.
>     So,rather than do GFN pre-checking and replace invalid GFNs with
>     scratch GFN and continue silently, just remove the pre-checking and
>     abort PPGTT shadowing on error detected.
> c. GFN validity check is still performed in
>     intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
>     It's more desirable to call VFIO interface to do both validity check
>     and mapping.
>     Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
>     while later mapping the GFN through VFIO interface is unnecessarily
>     fragile and confusing for unaware readers.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> [sean: remove now-unused local variables]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
>   1 file changed, 1 insertion(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 58b9b316ae46..f30922c55a0c 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -49,22 +49,6 @@
>   static bool enable_out_of_sync = false;
>   static int preallocated_oos_pages = 8192;
>   
> -static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
> -{
> -	struct kvm *kvm = vgpu->vfio_device.kvm;
> -	int idx;
> -	bool ret;
> -
> -	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> -		return false;
> -
> -	idx = srcu_read_lock(&kvm->srcu);
> -	ret = kvm_is_visible_gfn(kvm, gfn);
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return ret;
> -}
> -
>   /*
>    * validate a gm address and related range size,
>    * translate it to host gm address
> @@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>   {
>   	struct intel_vgpu *vgpu = spt->vgpu;
> -	struct intel_gvt *gvt = vgpu->gvt;
> -	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
>   	struct intel_vgpu_ppgtt_spt *s;
>   	struct intel_gvt_gtt_entry se, ge;
> -	unsigned long gfn, i;
> +	unsigned long i;
>   	int ret;
>   
>   	trace_spt_change(spt->vgpu->id, "born", spt,
> @@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>   			ppgtt_generate_shadow_entry(&se, s, &ge);
>   			ppgtt_set_shadow_entry(spt, &se, i);
>   		} else {
> -			gfn = ops->get_pfn(&ge);
> -			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
> -				ppgtt_set_shadow_entry(spt, &se, i);
> -				continue;
> -			}
> -
>   			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
>   			if (ret)
>   				goto fail;
> @@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>   		m.val64 = e.val64;
>   		m.type = e.type;
>   
> -		/* one PTE update may be issued in multiple writes and the
> -		 * first write may not construct a valid gfn
> -		 */
> -		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
> -			goto out;
> -		}
> -
>   		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
>   						   &dma_addr);
>   		if (ret) {
> @@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>   		ops->clear_present(&m);
>   	}
>   
> -out:
>   	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
>   
>   	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 02/28] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
@ 2023-05-15 11:07     ` Wang, Zhi A
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:07 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: Zhao, Yan Y, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 

Acked-by: Zhi Wang <zhi.a.wang@intel.com>

This was previously to avoid stepping down to the lower level ASAP
when a guest page is not used but still stays in the GPU page table.
(mostly in windows VM, as this is a behavior of WDDM GPU MM). But
it doesn't totally solve the the trapping flood in practice. I am
totally fine that it is removed. Post shadow in the trapping
path might be extended to handle this problem.

> Currently intel_gvt_is_valid_gfn() is called in two places:
> (1) shadowing guest GGTT entry
> (2) shadowing guest PPGTT leaf entry,
> which was introduced in commit cc753fbe1ac4
> ("drm/i915/gvt: validate gfn before set shadow page entry").
> 
> However, now it's not necessary to call this interface any more, because
> a. GGTT partial write issue has been fixed by
>     commit bc0686ff5fad
>     ("drm/i915/gvt: support inconsecutive partial gtt entry write")
>     commit 510fe10b6180
>     ("drm/i915/gvt: fix a bug of partially write ggtt enties")
> b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
>     as valid page table writes. Any invalid GPA found is regarded as
>     an error, either due to guest misbehavior/attack or bug in host
>     shadow code.
>     So,rather than do GFN pre-checking and replace invalid GFNs with
>     scratch GFN and continue silently, just remove the pre-checking and
>     abort PPGTT shadowing on error detected.
> c. GFN validity check is still performed in
>     intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
>     It's more desirable to call VFIO interface to do both validity check
>     and mapping.
>     Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
>     while later mapping the GFN through VFIO interface is unnecessarily
>     fragile and confusing for unaware readers.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> [sean: remove now-unused local variables]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
>   1 file changed, 1 insertion(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 58b9b316ae46..f30922c55a0c 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -49,22 +49,6 @@
>   static bool enable_out_of_sync = false;
>   static int preallocated_oos_pages = 8192;
>   
> -static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
> -{
> -	struct kvm *kvm = vgpu->vfio_device.kvm;
> -	int idx;
> -	bool ret;
> -
> -	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> -		return false;
> -
> -	idx = srcu_read_lock(&kvm->srcu);
> -	ret = kvm_is_visible_gfn(kvm, gfn);
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return ret;
> -}
> -
>   /*
>    * validate a gm address and related range size,
>    * translate it to host gm address
> @@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>   {
>   	struct intel_vgpu *vgpu = spt->vgpu;
> -	struct intel_gvt *gvt = vgpu->gvt;
> -	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
>   	struct intel_vgpu_ppgtt_spt *s;
>   	struct intel_gvt_gtt_entry se, ge;
> -	unsigned long gfn, i;
> +	unsigned long i;
>   	int ret;
>   
>   	trace_spt_change(spt->vgpu->id, "born", spt,
> @@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>   			ppgtt_generate_shadow_entry(&se, s, &ge);
>   			ppgtt_set_shadow_entry(spt, &se, i);
>   		} else {
> -			gfn = ops->get_pfn(&ge);
> -			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
> -				ppgtt_set_shadow_entry(spt, &se, i);
> -				continue;
> -			}
> -
>   			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
>   			if (ret)
>   				goto fail;
> @@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>   		m.val64 = e.val64;
>   		m.type = e.type;
>   
> -		/* one PTE update may be issued in multiple writes and the
> -		 * first write may not construct a valid gfn
> -		 */
> -		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
> -			goto out;
> -		}
> -
>   		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
>   						   &dma_addr);
>   		if (ret) {
> @@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>   		ops->clear_present(&m);
>   	}
>   
> -out:
>   	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
>   
>   	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
  2023-05-13  0:35   ` Sean Christopherson
@ 2023-05-15 11:24     ` Wang, Zhi A
  -1 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:24 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y, Ben Gardon

On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
> sole caller, ppgtt_populate_shadow_entry().  All of the paths in
> ppgtt_populate_shadow_entry() eventually check for attachment by way of
> intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
> unnecessary work and will make it more obvious that a future cleanup of
> is_2MB_gtt_possible() isn't introducing a bug.
> 

It might be better move this check to shadow_ppgtt_mm() which is used
in both shadow page table creation and pinning path so that the path
can bail out even earlier when creating a shadow page table but a vGPU
has not been attached to KVM yet.

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 5426a27c1b71..2aed31b497c9 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1163,8 +1163,6 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>   	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>   		return 0;
>   
> -	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> -		return -EINVAL;
>   	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
>   	if (is_error_noslot_pfn(pfn))
>   		return -EINVAL;
> @@ -1277,6 +1275,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   	if (!pte_ops->test_present(ge))
>   		return 0;
>   
> +	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> +		return -EINVAL;
> +
>   	gfn = pte_ops->get_pfn(ge);
>   
>   	switch (ge->type) {


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
@ 2023-05-15 11:24     ` Wang, Zhi A
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:24 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: Zhao, Yan Y, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
> sole caller, ppgtt_populate_shadow_entry().  All of the paths in
> ppgtt_populate_shadow_entry() eventually check for attachment by way of
> intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
> unnecessary work and will make it more obvious that a future cleanup of
> is_2MB_gtt_possible() isn't introducing a bug.
> 

It might be better move this check to shadow_ppgtt_mm() which is used
in both shadow page table creation and pinning path so that the path
can bail out even earlier when creating a shadow page table but a vGPU
has not been attached to KVM yet.

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 5426a27c1b71..2aed31b497c9 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1163,8 +1163,6 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>   	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>   		return 0;
>   
> -	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> -		return -EINVAL;
>   	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
>   	if (is_error_noslot_pfn(pfn))
>   		return -EINVAL;
> @@ -1277,6 +1275,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   	if (!pte_ops->test_present(ge))
>   		return 0;
>   
> +	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> +		return -EINVAL;
> +
>   	gfn = pte_ops->get_pfn(ge);
>   
>   	switch (ge->type) {


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/28] drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
  2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
@ 2023-05-15 11:28     ` Wang, Zhi A
  -1 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:28 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y, Ben Gardon

On 5/13/2023 8:35 AM, Sean Christopherson wrote:

Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>

> Bail from ppgtt_populate_shadow_entry() if an unexpected GTT entry type
> is encountered instead of subtly falling through to the common "direct
> shadow" path.  Eliminating the default/error path's reliance on the common
> handling will allow hoisting intel_gvt_dma_map_guest_page() into the case
> statements so that the 2MiB case can try intel_gvt_dma_map_guest_page()
> and fallback to splitting the entry on failure.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 2aed31b497c9..61e38acee2d5 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1306,6 +1306,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   		return -EINVAL;
>   	default:
>   		GEM_BUG_ON(1);
> +		return -EINVAL;
>   	}
>   
>   	/* direct shadow */


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 06/28] drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type
@ 2023-05-15 11:28     ` Wang, Zhi A
  0 siblings, 0 replies; 86+ messages in thread
From: Wang, Zhi A @ 2023-05-15 11:28 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang
  Cc: Zhao, Yan Y, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

On 5/13/2023 8:35 AM, Sean Christopherson wrote:

Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>

> Bail from ppgtt_populate_shadow_entry() if an unexpected GTT entry type
> is encountered instead of subtly falling through to the common "direct
> shadow" path.  Eliminating the default/error path's reliance on the common
> handling will allow hoisting intel_gvt_dma_map_guest_page() into the case
> statements so that the 2MiB case can try intel_gvt_dma_map_guest_page()
> and fallback to splitting the entry on failure.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   drivers/gpu/drm/i915/gvt/gtt.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 2aed31b497c9..61e38acee2d5 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1306,6 +1306,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>   		return -EINVAL;
>   	default:
>   		GEM_BUG_ON(1);
> +		return -EINVAL;
>   	}
>   
>   	/* direct shadow */


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
  2023-05-15 11:24     ` [Intel-gfx] " Wang, Zhi A
@ 2023-05-15 17:57       ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-15 17:57 UTC (permalink / raw)
  To: Zhi A Wang
  Cc: Yan Y Zhao, kvm, intel-gfx, linux-kernel, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev

On Mon, May 15, 2023, Zhi A Wang wrote:
> On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> > Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
> > sole caller, ppgtt_populate_shadow_entry().  All of the paths in
> > ppgtt_populate_shadow_entry() eventually check for attachment by way of
> > intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
> > unnecessary work and will make it more obvious that a future cleanup of
> > is_2MB_gtt_possible() isn't introducing a bug.
> > 
> 
> It might be better move this check to shadow_ppgtt_mm() which is used
> in both shadow page table creation and pinning path so that the path
> can bail out even earlier when creating a shadow page table but a vGPU
> has not been attached to KVM yet.

Ah, yes, that'll work.  I traced through all of the paths that lead to
ppgtt_populate_shadow_entry(), and shadow_ppgtt_mm() is the only caller that isn't
already gated by INTEL_VGPU_STATUS_ATTACHED or INTEL_VGPU_STATUS_ACTIVE (ACTIVE
is set iff ATTACHED is set).

I'll move the check up to shadow_ppgtt_mm() in the next version.

Thanks!


workload_thread() <= pick_next_workload() => INTEL_VGPU_STATUS_ACTIVE
|
-> dispatch_workload()
   |
   |-> prepare_workload()
       |
       -> intel_vgpu_sync_oos_pages()
       |  |
       |  |-> ppgtt_set_guest_page_sync()
       |      |
       |      |-> sync_oos_page()
       |          |
       |          |-> ppgtt_populate_shadow_entry()
       |
       |-> intel_vgpu_flush_post_shadow()
           |
1:         |-> ppgtt_handle_guest_write_page_table()
               |
               |-> ppgtt_handle_guest_entry_add()
                   |
2:                 | -> ppgtt_populate_spt_by_guest_entry()
                   |    |
                   |    |-> ppgtt_populate_spt()
                   |        |
                   |        |-> ppgtt_populate_shadow_entry()
                   |            |
                   |            |-> ppgtt_populate_spt_by_guest_entry() [see 2]
                   |
                   |-> ppgtt_populate_shadow_entry()


kvmgt_page_track_write()  <= KVM callback => INTEL_VGPU_STATUS_ATTACHED
|
|-> intel_vgpu_page_track_handler()
    |
    |-> ppgtt_write_protection_handler()
        |
        |-> ppgtt_handle_guest_write_page_table_bytes()
            |
            |-> ppgtt_handle_guest_write_page_table() [see 1]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
@ 2023-05-15 17:57       ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-15 17:57 UTC (permalink / raw)
  To: Zhi A Wang
  Cc: Paolo Bonzini, Zhenyu Wang, kvm, intel-gvt-dev, intel-gfx,
	linux-kernel, Yan Y Zhao, Ben Gardon

On Mon, May 15, 2023, Zhi A Wang wrote:
> On 5/13/2023 8:35 AM, Sean Christopherson wrote:
> > Move the check that a vGPU is attacked from is_2MB_gtt_possible() to its
> > sole caller, ppgtt_populate_shadow_entry().  All of the paths in
> > ppgtt_populate_shadow_entry() eventually check for attachment by way of
> > intel_gvt_dma_map_guest_page(), but explicitly checking can avoid
> > unnecessary work and will make it more obvious that a future cleanup of
> > is_2MB_gtt_possible() isn't introducing a bug.
> > 
> 
> It might be better move this check to shadow_ppgtt_mm() which is used
> in both shadow page table creation and pinning path so that the path
> can bail out even earlier when creating a shadow page table but a vGPU
> has not been attached to KVM yet.

Ah, yes, that'll work.  I traced through all of the paths that lead to
ppgtt_populate_shadow_entry(), and shadow_ppgtt_mm() is the only caller that isn't
already gated by INTEL_VGPU_STATUS_ATTACHED or INTEL_VGPU_STATUS_ACTIVE (ACTIVE
is set iff ATTACHED is set).

I'll move the check up to shadow_ppgtt_mm() in the next version.

Thanks!


workload_thread() <= pick_next_workload() => INTEL_VGPU_STATUS_ACTIVE
|
-> dispatch_workload()
   |
   |-> prepare_workload()
       |
       -> intel_vgpu_sync_oos_pages()
       |  |
       |  |-> ppgtt_set_guest_page_sync()
       |      |
       |      |-> sync_oos_page()
       |          |
       |          |-> ppgtt_populate_shadow_entry()
       |
       |-> intel_vgpu_flush_post_shadow()
           |
1:         |-> ppgtt_handle_guest_write_page_table()
               |
               |-> ppgtt_handle_guest_entry_add()
                   |
2:                 | -> ppgtt_populate_spt_by_guest_entry()
                   |    |
                   |    |-> ppgtt_populate_spt()
                   |        |
                   |        |-> ppgtt_populate_shadow_entry()
                   |            |
                   |            |-> ppgtt_populate_spt_by_guest_entry() [see 2]
                   |
                   |-> ppgtt_populate_shadow_entry()


kvmgt_page_track_write()  <= KVM callback => INTEL_VGPU_STATUS_ATTACHED
|
|-> intel_vgpu_page_track_handler()
    |
    |-> ppgtt_write_protection_handler()
        |
        |-> ppgtt_handle_guest_write_page_table_bytes()
            |
            |-> ppgtt_handle_guest_write_page_table() [see 1]

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-13  0:35   ` Sean Christopherson
@ 2023-05-16  9:37     ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-16  9:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

hi Sean

Do you think it's necessary to double check that struct page pointers
are also contiguous?

And do you like to also include a fix as below, which is to remove the
warning in vfio_device_container_unpin_pages() when npage is 0?

@ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
        *page = base_page;
        return 0;
 err:
-       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
+       if (npage)
+               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
        return ret;
 }

BTW, I've sent a separate fix for vfio_iommu_type1_pin_pages() to ensure
struct page is a valid address.
https://lore.kernel.org/lkml/20230516093007.15234-1-yan.y.zhao@intel.com/

On Fri, May 12, 2023 at 05:35:35PM -0700, Sean Christopherson wrote:
> When shadowing a GTT entry with a 2M page, verify that the pfns are
> contiguous, not just that the struct page pointers are contiguous.  The
> memory map is virtual contiguous if "CONFIG_FLATMEM=y ||
> CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y &&
> CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct
> pages that are virtually contiguous, but not physically contiguous.
> 
> In practice, this flaw is likely a non-issue as it would cause functional
> problems iff a section isn't 2M aligned _and_ is directly adjacent to
> another section with discontiguous pfns.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index de675d799c7d..429f0f993a13 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -161,7 +161,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  
>  		if (npage == 0)
>  			base_page = cur_page;
> -		else if (base_page + npage != cur_page) {
> +		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
>  			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
>  			npage++;
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-16  9:37     ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-16  9:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

hi Sean

Do you think it's necessary to double check that struct page pointers
are also contiguous?

And do you like to also include a fix as below, which is to remove the
warning in vfio_device_container_unpin_pages() when npage is 0?

@ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
        *page = base_page;
        return 0;
 err:
-       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
+       if (npage)
+               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
        return ret;
 }

BTW, I've sent a separate fix for vfio_iommu_type1_pin_pages() to ensure
struct page is a valid address.
https://lore.kernel.org/lkml/20230516093007.15234-1-yan.y.zhao@intel.com/

On Fri, May 12, 2023 at 05:35:35PM -0700, Sean Christopherson wrote:
> When shadowing a GTT entry with a 2M page, verify that the pfns are
> contiguous, not just that the struct page pointers are contiguous.  The
> memory map is virtual contiguous if "CONFIG_FLATMEM=y ||
> CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y &&
> CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct
> pages that are virtually contiguous, but not physically contiguous.
> 
> In practice, this flaw is likely a non-issue as it would cause functional
> problems iff a section isn't 2M aligned _and_ is directly adjacent to
> another section with discontiguous pfns.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index de675d799c7d..429f0f993a13 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -161,7 +161,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  
>  		if (npage == 0)
>  			base_page = cur_page;
> -		else if (base_page + npage != cur_page) {
> +		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
>  			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
>  			npage++;
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/28] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
  2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
@ 2023-05-17  0:57     ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  0:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gfx,
	linux-kernel, Ben Gardon, intel-gvt-dev

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:39PM -0700, Sean Christopherson wrote:
> Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
> transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
> 2MiB GTT entry is possible and instead just try to map the GFN with a 2MiB
> entry.  Using KVM to query pfn that is ultimately managed through VFIO is
> odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
> exported only because of KVM vendor modules (x86 and PPC).
> 
> Open code the check on 2MiB support instead of keeping
> is_2MB_gtt_possible() around for a single line of code.
> 
> Move the call to intel_gvt_dma_map_guest_page() for a 4KiB entry into its
> case statement, i.e. fork the common path into the 4KiB and 2MiB "direct"
> shadow paths.  Keeping the call in the "common" path is arguably more in
> the spirit of "one change per patch", but retaining the local "page_size"
> variable is silly, i.e. the call site will be changed either way, and
> jumping around the no-longer-common code is more subtle and rather odd,
> i.e. would just need to be immediately cleaned up.
> 
> Drop the error message from gvt_pin_guest_page() when KVMGT attempts to
> shadow a 2MiB guest page that isn't backed by a compatible hugepage in the
> host.  Dropping the pre-check on a THP makes it much more likely that the
> "error" will be encountered in normal operation.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c   | 49 ++++++--------------------------
>  drivers/gpu/drm/i915/gvt/kvmgt.c |  1 -
>  2 files changed, 8 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 61e38acee2d5..f505be9e647a 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1145,36 +1145,6 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
>  	ops->set_pfn(se, s->shadow_page.mfn);
>  }
>  
> -/*
> - * Check if can do 2M page
> - * @vgpu: target vgpu
> - * @entry: target pfn's gtt entry
> - *
> - * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
> - * negative if found err.
> - */
> -static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
> -	struct intel_gvt_gtt_entry *entry)
> -{
> -	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -	kvm_pfn_t pfn;
> -	int ret;
> -
> -	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
> -		return 0;
> -
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> -	if (is_error_noslot_pfn(pfn))
> -		return -EINVAL;
> -
> -	if (!pfn_valid(pfn))
> -		return -EINVAL;
> -
> -	ret = PageTransHuge(pfn_to_page(pfn));
> -	kvm_release_pfn_clean(pfn);
> -	return ret;
> -}
> -
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
>  	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
>  	struct intel_gvt_gtt_entry *se)
> @@ -1268,7 +1238,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  {
>  	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
>  	struct intel_gvt_gtt_entry se = *ge;
> -	unsigned long gfn, page_size = PAGE_SIZE;
> +	unsigned long gfn;
>  	dma_addr_t dma_addr;
>  	int ret;
>  
> @@ -1283,6 +1253,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  	switch (ge->type) {
>  	case GTT_TYPE_PPGTT_PTE_4K_ENTRY:
>  		gvt_vdbg_mm("shadow 4K gtt entry\n");
> +		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
> +		if (ret)
> +			return -ENXIO;
>  		break;
>  	case GTT_TYPE_PPGTT_PTE_64K_ENTRY:
>  		gvt_vdbg_mm("shadow 64K gtt entry\n");
> @@ -1294,12 +1267,10 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  		return split_64KB_gtt_entry(vgpu, spt, index, &se);
>  	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
>  		gvt_vdbg_mm("shadow 2M gtt entry\n");
> -		ret = is_2MB_gtt_possible(vgpu, ge);
> -		if (ret == 0)
> +		if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M) ||
> +		    intel_gvt_dma_map_guest_page(vgpu, gfn,
> +						 I915_GTT_PAGE_SIZE_2M, &dma_addr))
>  			return split_2MB_gtt_entry(vgpu, spt, index, &se);
> -		else if (ret < 0)
> -			return ret;
> -		page_size = I915_GTT_PAGE_SIZE_2M;
>  		break;
>  	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
>  		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
> @@ -1309,11 +1280,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  		return -EINVAL;
>  	}
>  
> -	/* direct shadow */
> -	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
> -	if (ret)
> -		return -ENXIO;
> -
> +	/* Successfully shadowed a 4K or 2M page (without splitting). */
>  	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
>  	ppgtt_set_shadow_entry(spt, &se, index);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 429f0f993a13..92ceefe1e6fb 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -162,7 +162,6 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  		if (npage == 0)
>  			base_page = cur_page;
>  		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
> -			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
>  			npage++;
>  			goto err;
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 07/28] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
@ 2023-05-17  0:57     ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  0:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:39PM -0700, Sean Christopherson wrote:
> Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
> transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
> 2MiB GTT entry is possible and instead just try to map the GFN with a 2MiB
> entry.  Using KVM to query pfn that is ultimately managed through VFIO is
> odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
> exported only because of KVM vendor modules (x86 and PPC).
> 
> Open code the check on 2MiB support instead of keeping
> is_2MB_gtt_possible() around for a single line of code.
> 
> Move the call to intel_gvt_dma_map_guest_page() for a 4KiB entry into its
> case statement, i.e. fork the common path into the 4KiB and 2MiB "direct"
> shadow paths.  Keeping the call in the "common" path is arguably more in
> the spirit of "one change per patch", but retaining the local "page_size"
> variable is silly, i.e. the call site will be changed either way, and
> jumping around the no-longer-common code is more subtle and rather odd,
> i.e. would just need to be immediately cleaned up.
> 
> Drop the error message from gvt_pin_guest_page() when KVMGT attempts to
> shadow a 2MiB guest page that isn't backed by a compatible hugepage in the
> host.  Dropping the pre-check on a THP makes it much more likely that the
> "error" will be encountered in normal operation.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c   | 49 ++++++--------------------------
>  drivers/gpu/drm/i915/gvt/kvmgt.c |  1 -
>  2 files changed, 8 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 61e38acee2d5..f505be9e647a 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1145,36 +1145,6 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
>  	ops->set_pfn(se, s->shadow_page.mfn);
>  }
>  
> -/*
> - * Check if can do 2M page
> - * @vgpu: target vgpu
> - * @entry: target pfn's gtt entry
> - *
> - * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
> - * negative if found err.
> - */
> -static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
> -	struct intel_gvt_gtt_entry *entry)
> -{
> -	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> -	kvm_pfn_t pfn;
> -	int ret;
> -
> -	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
> -		return 0;
> -
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> -	if (is_error_noslot_pfn(pfn))
> -		return -EINVAL;
> -
> -	if (!pfn_valid(pfn))
> -		return -EINVAL;
> -
> -	ret = PageTransHuge(pfn_to_page(pfn));
> -	kvm_release_pfn_clean(pfn);
> -	return ret;
> -}
> -
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
>  	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
>  	struct intel_gvt_gtt_entry *se)
> @@ -1268,7 +1238,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  {
>  	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
>  	struct intel_gvt_gtt_entry se = *ge;
> -	unsigned long gfn, page_size = PAGE_SIZE;
> +	unsigned long gfn;
>  	dma_addr_t dma_addr;
>  	int ret;
>  
> @@ -1283,6 +1253,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  	switch (ge->type) {
>  	case GTT_TYPE_PPGTT_PTE_4K_ENTRY:
>  		gvt_vdbg_mm("shadow 4K gtt entry\n");
> +		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
> +		if (ret)
> +			return -ENXIO;
>  		break;
>  	case GTT_TYPE_PPGTT_PTE_64K_ENTRY:
>  		gvt_vdbg_mm("shadow 64K gtt entry\n");
> @@ -1294,12 +1267,10 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  		return split_64KB_gtt_entry(vgpu, spt, index, &se);
>  	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
>  		gvt_vdbg_mm("shadow 2M gtt entry\n");
> -		ret = is_2MB_gtt_possible(vgpu, ge);
> -		if (ret == 0)
> +		if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M) ||
> +		    intel_gvt_dma_map_guest_page(vgpu, gfn,
> +						 I915_GTT_PAGE_SIZE_2M, &dma_addr))
>  			return split_2MB_gtt_entry(vgpu, spt, index, &se);
> -		else if (ret < 0)
> -			return ret;
> -		page_size = I915_GTT_PAGE_SIZE_2M;
>  		break;
>  	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
>  		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
> @@ -1309,11 +1280,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  		return -EINVAL;
>  	}
>  
> -	/* direct shadow */
> -	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
> -	if (ret)
> -		return -ENXIO;
> -
> +	/* Successfully shadowed a 4K or 2M page (without splitting). */
>  	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
>  	ppgtt_set_shadow_entry(spt, &se, index);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 429f0f993a13..92ceefe1e6fb 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -162,7 +162,6 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  		if (npage == 0)
>  			base_page = cur_page;
>  		else if (page_to_pfn(base_page) + npage != page_to_pfn(cur_page)) {
> -			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
>  			npage++;
>  			goto err;
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
@ 2023-05-17  2:06     ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  2:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:44PM -0700, Sean Christopherson wrote:
> Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
> bouncing through the page-track mechanism.  KVM (unfortunately) needs to
> zap and flush all page tables on memslot DELETE/MOVE irrespective of
> whether KVM is shadowing guest page tables.
> 
> This will allow changing KVM to register a page-track notifier on the
> first shadow root allocation, and will also allow deleting the misguided
> kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
> different method for reacting to memslot changes.
> 
> No functional change intended.
> 
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2e4476d38377..23a79723031b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
>  	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
>  }
>  
> -static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> -			struct kvm_memory_slot *slot,
> -			struct kvm_page_track_notifier_node *node)
> -{
> -	kvm_mmu_zap_all_fast(kvm);
> -}
> -
>  int kvm_mmu_init_vm(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> @@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
>  	}
>  
>  	node->track_write = kvm_mmu_pte_write;
> -	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
>  	kvm_page_track_register_notifier(kvm, node);
>  
>  	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
> @@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> +	kvm_mmu_zap_all_fast(kvm);
> +
>  	kvm_page_track_flush_slot(kvm, slot);
>  }
>  
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
@ 2023-05-17  2:06     ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  2:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:44PM -0700, Sean Christopherson wrote:
> Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
> bouncing through the page-track mechanism.  KVM (unfortunately) needs to
> zap and flush all page tables on memslot DELETE/MOVE irrespective of
> whether KVM is shadowing guest page tables.
> 
> This will allow changing KVM to register a page-track notifier on the
> first shadow root allocation, and will also allow deleting the misguided
> kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
> different method for reacting to memslot changes.
> 
> No functional change intended.
> 
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2e4476d38377..23a79723031b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
>  	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
>  }
>  
> -static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> -			struct kvm_memory_slot *slot,
> -			struct kvm_page_track_notifier_node *node)
> -{
> -	kvm_mmu_zap_all_fast(kvm);
> -}
> -
>  int kvm_mmu_init_vm(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> @@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
>  	}
>  
>  	node->track_write = kvm_mmu_pte_write;
> -	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
>  	kvm_page_track_register_notifier(kvm, node);
>  
>  	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
> @@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> +	kvm_mmu_zap_all_fast(kvm);
> +
>  	kvm_page_track_flush_slot(kvm, slot);
>  }
>  
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/28] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-05-13  0:35   ` Sean Christopherson
@ 2023-05-17  3:27     ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  3:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:53PM -0700, Sean Christopherson wrote:
> Disable the page-track notifier code at compile time if there are no
> external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> now hooks emulated writes directly instead of relying on the page-track
> mechanism.
> 
> Provide a stub for "struct kvm_page_track_notifier_node" so that including
> headers directly from the command line, e.g. for testing include guards,
> doesn't fail due to a struct having an incomplete type.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  2 ++
>  arch/x86/include/asm/kvm_page_track.h | 22 +++++++++++++-------
>  arch/x86/kvm/mmu/page_track.c         | 10 ++++-----
>  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
>  4 files changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 113598d3e886..5ce06a75d3de 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1247,7 +1247,9 @@ struct kvm_arch {
>  	 * create an NX huge page (without hanging the guest).
>  	 */
>  	struct list_head possible_nx_huge_pages;
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  	struct kvm_page_track_notifier_head track_notifier_head;
> +#endif
>  	/*
>  	 * Protects marking pages unsync during page faults, as TDP MMU page
>  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 76c0070dfe2a..61adb07b5927 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -9,6 +9,14 @@ enum kvm_page_track_mode {
>  	KVM_PAGE_TRACK_MAX,
>  };
>  
> +void kvm_slot_page_track_add_page(struct kvm *kvm,
> +				  struct kvm_memory_slot *slot, gfn_t gfn,
> +				  enum kvm_page_track_mode mode);
> +void kvm_slot_page_track_remove_page(struct kvm *kvm,
> +				     struct kvm_memory_slot *slot, gfn_t gfn,
> +				     enum kvm_page_track_mode mode);
> +
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  /*
>   * The notifier represented by @kvm_page_track_notifier_node is linked into
>   * the head which will be notified when guest is triggering the track event.
> @@ -48,18 +56,18 @@ struct kvm_page_track_notifier_node {
>  				    struct kvm_page_track_notifier_node *node);
>  };
>  
> -void kvm_slot_page_track_add_page(struct kvm *kvm,
> -				  struct kvm_memory_slot *slot, gfn_t gfn,
> -				  enum kvm_page_track_mode mode);
> -void kvm_slot_page_track_remove_page(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot, gfn_t gfn,
> -				     enum kvm_page_track_mode mode);
> -
>  void
>  kvm_page_track_register_notifier(struct kvm *kvm,
>  				 struct kvm_page_track_notifier_node *n);
>  void
>  kvm_page_track_unregister_notifier(struct kvm *kvm,
>  				   struct kvm_page_track_notifier_node *n);
> +#else
> +/*
> + * Allow defining a node in a structure even if page tracking is disabled, e.g.
> + * to play nice with testing headers via direct inclusion from the command line.
> + */
> +struct kvm_page_track_notifier_node {};
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
>  
>  #endif
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index e15329d48f95..b20aad7ac3fe 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -194,6 +194,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
>  }
>  
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  void kvm_page_track_cleanup(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_head *head;
> @@ -255,14 +256,13 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
>   * The node should figure out if the written page is the one that node is
>   * interested in by itself.
>   */
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes)
> +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
>  {
>  	struct kvm_page_track_notifier_head *head;
>  	struct kvm_page_track_notifier_node *n;
>  	int idx;
>  
> -	head = &vcpu->kvm->arch.track_notifier_head;
> +	head = &kvm->arch.track_notifier_head;
>  
>  	if (hlist_empty(&head->track_notifier_list))
>  		return;
> @@ -273,8 +273,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  		if (n->track_write)
>  			n->track_write(gpa, new, bytes, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
> -
> -	kvm_mmu_track_write(vcpu, gpa, new, bytes);
>  }
>  
>  /*
> @@ -299,3 +297,5 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  			n->track_remove_region(slot->base_gfn, slot->npages, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +#endif
> diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
> index 89712f123ad3..931b26b8fc8f 100644
> --- a/arch/x86/kvm/mmu/page_track.h
> +++ b/arch/x86/kvm/mmu/page_track.h
> @@ -6,8 +6,6 @@
>  
>  #include <asm/kvm_page_track.h>
>  
> -int kvm_page_track_init(struct kvm *kvm);
> -void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> @@ -21,13 +19,36 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  				   const struct kvm_memory_slot *slot,
>  				   gfn_t gfn, enum kvm_page_track_mode mode);
>  
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes);
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> +int kvm_page_track_init(struct kvm *kvm);
> +void kvm_page_track_cleanup(struct kvm *kvm);
> +
> +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);
>  void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>  
>  static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
>  {
>  	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
>  }
> +#else
> +static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
> +static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
> +
> +static inline void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa,
> +					  const u8 *new, int bytes) { }
> +static inline void kvm_page_track_delete_slot(struct kvm *kvm,
> +					      struct kvm_memory_slot *slot) { }
> +
> +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
> +
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
> +
> +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +					const u8 *new, int bytes)
> +{
> +	__kvm_page_track_write(vcpu->kvm, gpa, new, bytes);
> +
> +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> +}
>  
>  #endif /* __KVM_X86_PAGE_TRACK_H */
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 21/28] KVM: x86/mmu: Use page-track notifiers iff there are external users
@ 2023-05-17  3:27     ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-17  3:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, May 12, 2023 at 05:35:53PM -0700, Sean Christopherson wrote:
> Disable the page-track notifier code at compile time if there are no
> external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> now hooks emulated writes directly instead of relying on the page-track
> mechanism.
> 
> Provide a stub for "struct kvm_page_track_notifier_node" so that including
> headers directly from the command line, e.g. for testing include guards,
> doesn't fail due to a struct having an incomplete type.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  2 ++
>  arch/x86/include/asm/kvm_page_track.h | 22 +++++++++++++-------
>  arch/x86/kvm/mmu/page_track.c         | 10 ++++-----
>  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
>  4 files changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 113598d3e886..5ce06a75d3de 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1247,7 +1247,9 @@ struct kvm_arch {
>  	 * create an NX huge page (without hanging the guest).
>  	 */
>  	struct list_head possible_nx_huge_pages;
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  	struct kvm_page_track_notifier_head track_notifier_head;
> +#endif
>  	/*
>  	 * Protects marking pages unsync during page faults, as TDP MMU page
>  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 76c0070dfe2a..61adb07b5927 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -9,6 +9,14 @@ enum kvm_page_track_mode {
>  	KVM_PAGE_TRACK_MAX,
>  };
>  
> +void kvm_slot_page_track_add_page(struct kvm *kvm,
> +				  struct kvm_memory_slot *slot, gfn_t gfn,
> +				  enum kvm_page_track_mode mode);
> +void kvm_slot_page_track_remove_page(struct kvm *kvm,
> +				     struct kvm_memory_slot *slot, gfn_t gfn,
> +				     enum kvm_page_track_mode mode);
> +
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  /*
>   * The notifier represented by @kvm_page_track_notifier_node is linked into
>   * the head which will be notified when guest is triggering the track event.
> @@ -48,18 +56,18 @@ struct kvm_page_track_notifier_node {
>  				    struct kvm_page_track_notifier_node *node);
>  };
>  
> -void kvm_slot_page_track_add_page(struct kvm *kvm,
> -				  struct kvm_memory_slot *slot, gfn_t gfn,
> -				  enum kvm_page_track_mode mode);
> -void kvm_slot_page_track_remove_page(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot, gfn_t gfn,
> -				     enum kvm_page_track_mode mode);
> -
>  void
>  kvm_page_track_register_notifier(struct kvm *kvm,
>  				 struct kvm_page_track_notifier_node *n);
>  void
>  kvm_page_track_unregister_notifier(struct kvm *kvm,
>  				   struct kvm_page_track_notifier_node *n);
> +#else
> +/*
> + * Allow defining a node in a structure even if page tracking is disabled, e.g.
> + * to play nice with testing headers via direct inclusion from the command line.
> + */
> +struct kvm_page_track_notifier_node {};
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
>  
>  #endif
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index e15329d48f95..b20aad7ac3fe 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -194,6 +194,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
>  }
>  
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  void kvm_page_track_cleanup(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_head *head;
> @@ -255,14 +256,13 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
>   * The node should figure out if the written page is the one that node is
>   * interested in by itself.
>   */
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes)
> +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
>  {
>  	struct kvm_page_track_notifier_head *head;
>  	struct kvm_page_track_notifier_node *n;
>  	int idx;
>  
> -	head = &vcpu->kvm->arch.track_notifier_head;
> +	head = &kvm->arch.track_notifier_head;
>  
>  	if (hlist_empty(&head->track_notifier_list))
>  		return;
> @@ -273,8 +273,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  		if (n->track_write)
>  			n->track_write(gpa, new, bytes, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
> -
> -	kvm_mmu_track_write(vcpu, gpa, new, bytes);
>  }
>  
>  /*
> @@ -299,3 +297,5 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  			n->track_remove_region(slot->base_gfn, slot->npages, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +#endif
> diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
> index 89712f123ad3..931b26b8fc8f 100644
> --- a/arch/x86/kvm/mmu/page_track.h
> +++ b/arch/x86/kvm/mmu/page_track.h
> @@ -6,8 +6,6 @@
>  
>  #include <asm/kvm_page_track.h>
>  
> -int kvm_page_track_init(struct kvm *kvm);
> -void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> @@ -21,13 +19,36 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  				   const struct kvm_memory_slot *slot,
>  				   gfn_t gfn, enum kvm_page_track_mode mode);
>  
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes);
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> +int kvm_page_track_init(struct kvm *kvm);
> +void kvm_page_track_cleanup(struct kvm *kvm);
> +
> +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);
>  void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>  
>  static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
>  {
>  	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
>  }
> +#else
> +static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
> +static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
> +
> +static inline void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa,
> +					  const u8 *new, int bytes) { }
> +static inline void kvm_page_track_delete_slot(struct kvm *kvm,
> +					      struct kvm_memory_slot *slot) { }
> +
> +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
> +
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
> +
> +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +					const u8 *new, int bytes)
> +{
> +	__kvm_page_track_write(vcpu->kvm, gpa, new, bytes);
> +
> +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> +}
>  
>  #endif /* __KVM_X86_PAGE_TRACK_H */
> -- 
> 2.40.1.606.ga4b1b128d6-goog
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-16  9:37     ` [Intel-gfx] " Yan Zhao
@ 2023-05-17 14:50       ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-17 14:50 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

On Tue, May 16, 2023, Yan Zhao wrote:
> hi Sean
> 
> Do you think it's necessary to double check that struct page pointers
> are also contiguous?

No, the virtual address space should be irrelevant.  The only way it would be
problematic is if something in dma_map_page() expected to be able to access the
entire chunk of memory by getting the virtual address of only the first page,
but I can't imagine that code is reading or writing memory, let alone doing so
across a huge range of memory.

> And do you like to also include a fix as below, which is to remove the
> warning in vfio_device_container_unpin_pages() when npage is 0?
> 
> @ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>         *page = base_page;
>         return 0;
>  err:
> -       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> +       if (npage)
> +               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
>         return ret;
>  }

Sure.  Want to give your SoB?  I'll write a changelog.

Thanks again!

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-17 14:50       ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-17 14:50 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Tue, May 16, 2023, Yan Zhao wrote:
> hi Sean
> 
> Do you think it's necessary to double check that struct page pointers
> are also contiguous?

No, the virtual address space should be irrelevant.  The only way it would be
problematic is if something in dma_map_page() expected to be able to access the
entire chunk of memory by getting the virtual address of only the first page,
but I can't imagine that code is reading or writing memory, let alone doing so
across a huge range of memory.

> And do you like to also include a fix as below, which is to remove the
> warning in vfio_device_container_unpin_pages() when npage is 0?
> 
> @ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>         *page = base_page;
>         return 0;
>  err:
> -       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> +       if (npage)
> +               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
>         return ret;
>  }

Sure.  Want to give your SoB?  I'll write a changelog.

Thanks again!

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-17 14:50       ` Sean Christopherson
@ 2023-05-18  9:06         ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-18  9:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> On Tue, May 16, 2023, Yan Zhao wrote:
> > hi Sean
> > 
> > Do you think it's necessary to double check that struct page pointers
> > are also contiguous?
> 
> No, the virtual address space should be irrelevant.  The only way it would be
> problematic is if something in dma_map_page() expected to be able to access the
> entire chunk of memory by getting the virtual address of only the first page,
> but I can't imagine that code is reading or writing memory, let alone doing so
> across a huge range of memory.
Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
virtual address of pages passed in, but it's implemented as page by page, not only
from the first page.

dma_map_page
  dma_map_page_attrs
    ops->map_page
      arm_iommu_map_page
         __dma_page_cpu_to_dev
           dma_cache_maint_page


Just a little worried about the condition of PFNs are contiguous
while they belong to different backends, e.g. one from system memory and
one from MMIO.
But I don't know how to avoid this without complicated checks.
And this condition might not happen in practice.


> 
> > And do you like to also include a fix as below, which is to remove the
> > warning in vfio_device_container_unpin_pages() when npage is 0?
> > 
> > @ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> >         *page = base_page;
> >         return 0;
> >  err:
> > -       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> > +       if (npage)
> > +               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> >         return ret;
> >  }
> 
> Sure.  Want to give your SoB?  I'll write a changelog.
>
Thanks!
It's just a small code piece. Whatever is convenient for you :)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-18  9:06         ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-18  9:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> On Tue, May 16, 2023, Yan Zhao wrote:
> > hi Sean
> > 
> > Do you think it's necessary to double check that struct page pointers
> > are also contiguous?
> 
> No, the virtual address space should be irrelevant.  The only way it would be
> problematic is if something in dma_map_page() expected to be able to access the
> entire chunk of memory by getting the virtual address of only the first page,
> but I can't imagine that code is reading or writing memory, let alone doing so
> across a huge range of memory.
Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
virtual address of pages passed in, but it's implemented as page by page, not only
from the first page.

dma_map_page
  dma_map_page_attrs
    ops->map_page
      arm_iommu_map_page
         __dma_page_cpu_to_dev
           dma_cache_maint_page


Just a little worried about the condition of PFNs are contiguous
while they belong to different backends, e.g. one from system memory and
one from MMIO.
But I don't know how to avoid this without complicated checks.
And this condition might not happen in practice.


> 
> > And do you like to also include a fix as below, which is to remove the
> > warning in vfio_device_container_unpin_pages() when npage is 0?
> > 
> > @ -169,7 +173,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> >         *page = base_page;
> >         return 0;
> >  err:
> > -       gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> > +       if (npage)
> > +               gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> >         return ret;
> >  }
> 
> Sure.  Want to give your SoB?  I'll write a changelog.
>
Thanks!
It's just a small code piece. Whatever is convenient for you :)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-18  9:06         ` [Intel-gfx] " Yan Zhao
@ 2023-05-18 18:04           ` Sean Christopherson
  -1 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-18 18:04 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Thu, May 18, 2023, Yan Zhao wrote:
> On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> > On Tue, May 16, 2023, Yan Zhao wrote:
> > > hi Sean
> > > 
> > > Do you think it's necessary to double check that struct page pointers
> > > are also contiguous?
> > 
> > No, the virtual address space should be irrelevant.  The only way it would be
> > problematic is if something in dma_map_page() expected to be able to access the
> > entire chunk of memory by getting the virtual address of only the first page,
> > but I can't imagine that code is reading or writing memory, let alone doing so
> > across a huge range of memory.
> Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
> virtual address of pages passed in, but it's implemented as page by page, not only
> from the first page.
> 
> dma_map_page
>   dma_map_page_attrs
>     ops->map_page
>       arm_iommu_map_page

Heh, thankfully this is ARM specific, which presumably doesn't collide with KVMGT.

>          __dma_page_cpu_to_dev
>            dma_cache_maint_page
> 
> Just a little worried about the condition of PFNs are contiguous
> while they belong to different backends, e.g. one from system memory and
> one from MMIO.
> But I don't know how to avoid this without complicated checks.
> And this condition might not happen in practice.

IMO, assuming that contiguous pfns are vritually contiguous is wrong, i.e. would
be a bug in the other code.  The above dma_cache_maint_page() get's this right,
and even has a well written comment to boot.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-18 18:04           ` Sean Christopherson
  0 siblings, 0 replies; 86+ messages in thread
From: Sean Christopherson @ 2023-05-18 18:04 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

On Thu, May 18, 2023, Yan Zhao wrote:
> On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> > On Tue, May 16, 2023, Yan Zhao wrote:
> > > hi Sean
> > > 
> > > Do you think it's necessary to double check that struct page pointers
> > > are also contiguous?
> > 
> > No, the virtual address space should be irrelevant.  The only way it would be
> > problematic is if something in dma_map_page() expected to be able to access the
> > entire chunk of memory by getting the virtual address of only the first page,
> > but I can't imagine that code is reading or writing memory, let alone doing so
> > across a huge range of memory.
> Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
> virtual address of pages passed in, but it's implemented as page by page, not only
> from the first page.
> 
> dma_map_page
>   dma_map_page_attrs
>     ops->map_page
>       arm_iommu_map_page

Heh, thankfully this is ARM specific, which presumably doesn't collide with KVMGT.

>          __dma_page_cpu_to_dev
>            dma_cache_maint_page
> 
> Just a little worried about the condition of PFNs are contiguous
> while they belong to different backends, e.g. one from system memory and
> one from MMIO.
> But I don't know how to avoid this without complicated checks.
> And this condition might not happen in practice.

IMO, assuming that contiguous pfns are vritually contiguous is wrong, i.e. would
be a bug in the other code.  The above dma_cache_maint_page() get's this right,
and even has a well written comment to boot.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
  2023-05-18 18:04           ` [Intel-gfx] " Sean Christopherson
@ 2023-05-19  3:18             ` Yan Zhao
  -1 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-19  3:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Ben Gardon, Paolo Bonzini, intel-gvt-dev

On Thu, May 18, 2023 at 11:04:46AM -0700, Sean Christopherson wrote:
> On Thu, May 18, 2023, Yan Zhao wrote:
> > On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> > > On Tue, May 16, 2023, Yan Zhao wrote:
> > > > hi Sean
> > > > 
> > > > Do you think it's necessary to double check that struct page pointers
> > > > are also contiguous?
> > > 
> > > No, the virtual address space should be irrelevant.  The only way it would be
> > > problematic is if something in dma_map_page() expected to be able to access the
> > > entire chunk of memory by getting the virtual address of only the first page,
> > > but I can't imagine that code is reading or writing memory, let alone doing so
> > > across a huge range of memory.
> > Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
> > virtual address of pages passed in, but it's implemented as page by page, not only
> > from the first page.
> > 
> > dma_map_page
> >   dma_map_page_attrs
> >     ops->map_page
> >       arm_iommu_map_page
> 
> Heh, thankfully this is ARM specific, which presumably doesn't collide with KVMGT.

Actually, this is fine with KVMGT (owning to page by page access), isn't it?
:)

> 
> >          __dma_page_cpu_to_dev
> >            dma_cache_maint_page
> > 
> > Just a little worried about the condition of PFNs are contiguous
> > while they belong to different backends, e.g. one from system memory and
> > one from MMIO.
> > But I don't know how to avoid this without complicated checks.
> > And this condition might not happen in practice.
> 
> IMO, assuming that contiguous pfns are vritually contiguous is wrong, i.e. would
> be a bug in the other code.  The above dma_cache_maint_page() get's this right,
> and even has a well written comment to boot.
Right.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space
@ 2023-05-19  3:18             ` Yan Zhao
  0 siblings, 0 replies; 86+ messages in thread
From: Yan Zhao @ 2023-05-19  3:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Thu, May 18, 2023 at 11:04:46AM -0700, Sean Christopherson wrote:
> On Thu, May 18, 2023, Yan Zhao wrote:
> > On Wed, May 17, 2023 at 07:50:26AM -0700, Sean Christopherson wrote:
> > > On Tue, May 16, 2023, Yan Zhao wrote:
> > > > hi Sean
> > > > 
> > > > Do you think it's necessary to double check that struct page pointers
> > > > are also contiguous?
> > > 
> > > No, the virtual address space should be irrelevant.  The only way it would be
> > > problematic is if something in dma_map_page() expected to be able to access the
> > > entire chunk of memory by getting the virtual address of only the first page,
> > > but I can't imagine that code is reading or writing memory, let alone doing so
> > > across a huge range of memory.
> > Yes, I do find arm_iommu version of dma_map_page() access the memory by getting
> > virtual address of pages passed in, but it's implemented as page by page, not only
> > from the first page.
> > 
> > dma_map_page
> >   dma_map_page_attrs
> >     ops->map_page
> >       arm_iommu_map_page
> 
> Heh, thankfully this is ARM specific, which presumably doesn't collide with KVMGT.

Actually, this is fine with KVMGT (owning to page by page access), isn't it?
:)

> 
> >          __dma_page_cpu_to_dev
> >            dma_cache_maint_page
> > 
> > Just a little worried about the condition of PFNs are contiguous
> > while they belong to different backends, e.g. one from system memory and
> > one from MMIO.
> > But I don't know how to avoid this without complicated checks.
> > And this condition might not happen in practice.
> 
> IMO, assuming that contiguous pfns are vritually contiguous is wrong, i.e. would
> be a bug in the other code.  The above dma_cache_maint_page() get's this right,
> and even has a well written comment to boot.
Right.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* RE: [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
  2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
@ 2023-06-14  3:06   ` Ma, Yongwei
  -1 siblings, 0 replies; 86+ messages in thread
From: Ma, Yongwei @ 2023-06-14  3:06 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang, Wang, Zhi A
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y,
	Ben Gardon, Ma, Yongwei

> Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> page-track APIs to provide a leaner and cleaner interface.  The motivation for
> this series is to (significantly) reduce the number of KVM APIs that KVMGT
> uses, with a long-term goal of making all kvm_host.h headers KVM-internal.
> 
> As always for this series, the KVMGT changes are compile tested only.
> 
> Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".
> 
> v3:
>  - Collect reviewed/tested tags (I apologize if I missed any, I manually
>    gathered them this time due to a goof in my workflow). [Yan]
>  - Drop check on max KVM paging size from KVMGT. [Yan]
>  - Drop the explicit change on THP pages, and instead validate that the
>    pfns (not struct page pointers) are contiguous. [Yan]
>  - Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
>    for shadowing 2MiB GTT entries. [Yan]
>  - Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of
> exposing
>    kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
>  - Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
>  - Wrap full definition of external page track structures with
>    CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]
> 
> v2:
>  - https://lore.kernel.org/all/20230311002258.852397-1-seanjc@google.com
>  - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
>    buggy) mutext. [Yan]
>  - Remove a spurious return from kvm_page_track_init(). [Yan]
>  - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
>  - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
>    of providing a dedicated interface. [Yan]
> 
> v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-
> seanjc@google.com
> 
> Sean Christopherson (24):
>   drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
>   drm/i915/gvt: Verify hugepages are contiguous in physical address
>     space
>   drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
>   drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
>   drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
>     entry type
>   drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
>     GTT
>   drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
>   drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
>   drm/i915/gvt: Protect gfn hash table with vgpu_lock
>   KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
>   KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
>     change
>   KVM: x86/mmu: Don't bounce through page-track mechanism for guest
> PTEs
>   KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
>   KVM: x86: Reject memslot MOVE operations if KVMGT is attached
>   drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
>     slot
>   KVM: x86/mmu: Move KVM-only page-track declarations to internal header
>   KVM: x86/mmu: Use page-track notifiers iff there are external users
>   KVM: x86/mmu: Drop infrastructure for multiple page-track modes
>   KVM: x86/mmu: Rename page-track APIs to reflect the new reality
>   KVM: x86/mmu: Assert that correct locks are held for page
>     write-tracking
>   KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
>   KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
>   KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
>   drm/i915/gvt: Drop final dependencies on KVM internal details
> 
> Yan Zhao (4):
>   drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
>   KVM: x86: Add a new page-track hook to handle memslot deletion
>   drm/i915/gvt: switch from ->track_flush_slot() to
>     ->track_remove_region()
>   KVM: x86: Remove the unused page-track hook track_flush_slot()
> 
>  arch/x86/include/asm/kvm_host.h       |  16 +-
>  arch/x86/include/asm/kvm_page_track.h |  73 +++-----
>  arch/x86/kvm/mmu.h                    |   2 +
>  arch/x86/kvm/mmu/mmu.c                |  51 +++--
>  arch/x86/kvm/mmu/page_track.c         | 256 +++++++++++++-------------
>  arch/x86/kvm/mmu/page_track.h         |  58 ++++++
>  arch/x86/kvm/x86.c                    |  22 +--
>  drivers/gpu/drm/i915/gvt/gtt.c        | 102 ++--------
>  drivers/gpu/drm/i915/gvt/gtt.h        |   1 -
>  drivers/gpu/drm/i915/gvt/gvt.h        |   3 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c      | 117 +++++-------
>  drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
>  12 files changed, 320 insertions(+), 391 deletions(-)  create mode 100644
> arch/x86/kvm/mmu/page_track.h
> 
> 
> base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
> --
> 2.40.1.606.ga4b1b128d6-goog
Verified GVT-g on Intel platform host Core(TM) i7-8559U CPU @ 2.70GHz 
 + Ubuntu22.04 LTS.

Both Linux Ubuntu 22.04 VM and Windows10 VM could boot up successfully. 
3D benchmark GLmark2 can run in the guest VM.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>

Best Regards,
Yongwei Ma


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [Intel-gfx] [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
@ 2023-06-14  3:06   ` Ma, Yongwei
  0 siblings, 0 replies; 86+ messages in thread
From: Ma, Yongwei @ 2023-06-14  3:06 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang, Wang, Zhi A
  Cc: Zhao, Yan Y, kvm, intel-gfx, linux-kernel, Ma, Yongwei,
	Ben Gardon, intel-gvt-dev

> Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> page-track APIs to provide a leaner and cleaner interface.  The motivation for
> this series is to (significantly) reduce the number of KVM APIs that KVMGT
> uses, with a long-term goal of making all kvm_host.h headers KVM-internal.
> 
> As always for this series, the KVMGT changes are compile tested only.
> 
> Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".
> 
> v3:
>  - Collect reviewed/tested tags (I apologize if I missed any, I manually
>    gathered them this time due to a goof in my workflow). [Yan]
>  - Drop check on max KVM paging size from KVMGT. [Yan]
>  - Drop the explicit change on THP pages, and instead validate that the
>    pfns (not struct page pointers) are contiguous. [Yan]
>  - Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
>    for shadowing 2MiB GTT entries. [Yan]
>  - Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of
> exposing
>    kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
>  - Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
>  - Wrap full definition of external page track structures with
>    CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]
> 
> v2:
>  - https://lore.kernel.org/all/20230311002258.852397-1-seanjc@google.com
>  - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
>    buggy) mutext. [Yan]
>  - Remove a spurious return from kvm_page_track_init(). [Yan]
>  - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
>  - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
>    of providing a dedicated interface. [Yan]
> 
> v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-
> seanjc@google.com
> 
> Sean Christopherson (24):
>   drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
>   drm/i915/gvt: Verify hugepages are contiguous in physical address
>     space
>   drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
>   drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
>   drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
>     entry type
>   drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
>     GTT
>   drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
>   drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
>   drm/i915/gvt: Protect gfn hash table with vgpu_lock
>   KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
>   KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
>     change
>   KVM: x86/mmu: Don't bounce through page-track mechanism for guest
> PTEs
>   KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
>   KVM: x86: Reject memslot MOVE operations if KVMGT is attached
>   drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
>     slot
>   KVM: x86/mmu: Move KVM-only page-track declarations to internal header
>   KVM: x86/mmu: Use page-track notifiers iff there are external users
>   KVM: x86/mmu: Drop infrastructure for multiple page-track modes
>   KVM: x86/mmu: Rename page-track APIs to reflect the new reality
>   KVM: x86/mmu: Assert that correct locks are held for page
>     write-tracking
>   KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
>   KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
>   KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
>   drm/i915/gvt: Drop final dependencies on KVM internal details
> 
> Yan Zhao (4):
>   drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
>   KVM: x86: Add a new page-track hook to handle memslot deletion
>   drm/i915/gvt: switch from ->track_flush_slot() to
>     ->track_remove_region()
>   KVM: x86: Remove the unused page-track hook track_flush_slot()
> 
>  arch/x86/include/asm/kvm_host.h       |  16 +-
>  arch/x86/include/asm/kvm_page_track.h |  73 +++-----
>  arch/x86/kvm/mmu.h                    |   2 +
>  arch/x86/kvm/mmu/mmu.c                |  51 +++--
>  arch/x86/kvm/mmu/page_track.c         | 256 +++++++++++++-------------
>  arch/x86/kvm/mmu/page_track.h         |  58 ++++++
>  arch/x86/kvm/x86.c                    |  22 +--
>  drivers/gpu/drm/i915/gvt/gtt.c        | 102 ++--------
>  drivers/gpu/drm/i915/gvt/gtt.h        |   1 -
>  drivers/gpu/drm/i915/gvt/gvt.h        |   3 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c      | 117 +++++-------
>  drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
>  12 files changed, 320 insertions(+), 391 deletions(-)  create mode 100644
> arch/x86/kvm/mmu/page_track.h
> 
> 
> base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
> --
> 2.40.1.606.ga4b1b128d6-goog
Verified GVT-g on Intel platform host Core(TM) i7-8559U CPU @ 2.70GHz 
 + Ubuntu22.04 LTS.

Both Linux Ubuntu 22.04 VM and Windows10 VM could boot up successfully. 
3D benchmark GLmark2 can run in the guest VM.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>

Best Regards,
Yongwei Ma


^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2023-06-14  3:06 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-13  0:35 [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 01/28] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 02/28] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-15 11:07   ` Wang, Zhi A
2023-05-15 11:07     ` [Intel-gfx] " Wang, Zhi A
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 03/28] drm/i915/gvt: Verify hugepages are contiguous in physical address space Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-16  9:37   ` Yan Zhao
2023-05-16  9:37     ` [Intel-gfx] " Yan Zhao
2023-05-17 14:50     ` Sean Christopherson
2023-05-17 14:50       ` Sean Christopherson
2023-05-18  9:06       ` Yan Zhao
2023-05-18  9:06         ` [Intel-gfx] " Yan Zhao
2023-05-18 18:04         ` Sean Christopherson
2023-05-18 18:04           ` [Intel-gfx] " Sean Christopherson
2023-05-19  3:18           ` Yan Zhao
2023-05-19  3:18             ` Yan Zhao
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 04/28] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 05/28] drm/i915/gvt: Explicitly check that vGPU is attached before shadowing Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-15 11:24   ` Wang, Zhi A
2023-05-15 11:24     ` [Intel-gfx] " Wang, Zhi A
2023-05-15 17:57     ` Sean Christopherson
2023-05-15 17:57       ` Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 06/28] drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT entry type Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-15 11:28   ` Wang, Zhi A
2023-05-15 11:28     ` [Intel-gfx] " Wang, Zhi A
2023-05-13  0:35 ` [PATCH v3 07/28] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-17  0:57   ` Yan Zhao
2023-05-17  0:57     ` [Intel-gfx] " Yan Zhao
2023-05-13  0:35 ` [PATCH v3 08/28] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 09/28] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt() Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 10/28] drm/i915/gvt: Protect gfn hash table with vgpu_lock Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 11/28] KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] [PATCH v3 11/28] KVM: x86/mmu: Move kvm_arch_flush_shadow_{all, memslot}() " Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-17  2:06   ` Yan Zhao
2023-05-17  2:06     ` [Intel-gfx] " Yan Zhao
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 13/28] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [PATCH v3 14/28] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
2023-05-13  0:35   ` [Intel-gfx] " Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 15/28] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 16/28] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 17/28] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 18/28] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 19/28] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 20/28] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 21/28] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-17  3:27   ` Yan Zhao
2023-05-17  3:27     ` [Intel-gfx] " Yan Zhao
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 22/28] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 23/28] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 24/28] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 25/28] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 26/28] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:35 ` [Intel-gfx] [PATCH v3 27/28] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
2023-05-13  0:35   ` Sean Christopherson
2023-05-13  0:36 ` [Intel-gfx] [PATCH v3 28/28] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
2023-05-13  0:36   ` Sean Christopherson
2023-05-13  1:46 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev8) Patchwork
2023-05-13  2:00 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-06-14  3:06 ` [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Ma, Yongwei
2023-06-14  3:06   ` [Intel-gfx] " Ma, Yongwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.