kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
@ 2023-03-11  0:22 Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
                   ` (27 more replies)
  0 siblings, 28 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
page-track APIs to provide a leaner and cleaner interface.  The motivation
for this series is to (significantly) reduce the number of KVM APIs that
KVMGT uses, with a long-term goal of making all kvm_host.h headers
KVM-internal.

As was the case in v1, tThe KVMGT changes are compile tested only.

Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git next".

v2:
 - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
   buggy) mutext. [Yan]
 - Remove a spurious return from kvm_page_track_init(). [Yan]
 - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
 - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
   of providing a dedicated interface. [Yan]

v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-seanjc@google.com

Sean Christopherson (23):
  drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  KVM: x86/mmu: Factor out helper to get max mapping size of a memslot
  drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT
    entry
  drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt
    entry
  drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
    GTT
  drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  drm/i915/gvt: Protect gfn hash table with vgpu_lock
  KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
    change
  KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
    slot
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Assert that correct locks are held for page
    write-tracking
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  drm/i915/gvt: Drop final dependencies on KVM internal details

Yan Zhao (4):
  drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: switch from ->track_flush_slot() to
    ->track_remove_region()
  KVM: x86: Remove the unused page-track hook track_flush_slot()

 arch/x86/include/asm/kvm_host.h       |  16 +-
 arch/x86/include/asm/kvm_page_track.h |  66 +++----
 arch/x86/kvm/mmu.h                    |   2 +
 arch/x86/kvm/mmu/mmu.c                |  61 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |   2 +
 arch/x86/kvm/mmu/page_track.c         | 270 ++++++++++++++------------
 arch/x86/kvm/mmu/page_track.h         |  58 ++++++
 arch/x86/kvm/x86.c                    |  13 +-
 drivers/gpu/drm/i915/gvt/gtt.c        |  88 ++-------
 drivers/gpu/drm/i915/gvt/gtt.h        |   1 -
 drivers/gpu/drm/i915/gvt/gvt.h        |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 132 ++++++-------
 drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
 13 files changed, 361 insertions(+), 361 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h


base-commit: 45dd9bc75d9adc9483f0c7d662ba6e73ed698a0b
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-13 15:37   ` Wang, Wei W
  2023-03-17  4:20   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
                   ` (26 subsequent siblings)
  27 siblings, 2 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Check that the pfn found by gfn_to_pfn() is actually backed by "struct
page" memory prior to retrieving and dereferencing the page.  KVM
supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
there is no guarantee the pfn returned by gfn_to_pfn() has an associated
"struct page".

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 4ec85308379a..58b9b316ae46 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
 	return PageTransHuge(pfn_to_page(pfn));
 }
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-13 15:37   ` Wang, Wei W
  2023-03-11  0:22 ` [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn Sean Christopherson
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Extract the memslot-related logic of kvm_mmu_max_mapping_level() into a
new helper so that KVMGT can determine whether or not mapping a 2MiB page
into the guest is (dis)allowed per KVM's memslots.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8ebe542c565..4685c80e441b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3083,20 +3083,29 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return level;
 }
 
+int kvm_mmu_max_slot_mapping_level(const struct kvm_memory_slot *slot,
+				   gfn_t gfn, int max_level)
+{
+	struct kvm_lpage_info *linfo;
+
+	for ( ; max_level > PG_LEVEL_4K; max_level--) {
+		linfo = lpage_info_slot(gfn, slot, max_level);
+		if (!linfo->disallow_lpage)
+			break;
+	}
+	return max_level;
+}
+
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
 			      int max_level)
 {
-	struct kvm_lpage_info *linfo;
 	int host_level;
 
 	max_level = min(max_level, max_huge_page_level);
-	for ( ; max_level > PG_LEVEL_4K; max_level--) {
-		linfo = lpage_info_slot(gfn, slot, max_level);
-		if (!linfo->disallow_lpage)
-			break;
-	}
+	max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
 
+	/* Avoid walking the host page tables if a hugepage is impossible. */
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index cc58631e2336..9db7fa0b3bf9 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -328,6 +328,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return r;
 }
 
+int kvm_mmu_max_slot_mapping_level(const struct kvm_memory_slot *slot,
+				   gfn_t gfn, int max_level);
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
 			      int max_level);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  4:26   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Currently intel_gvt_is_valid_gfn() is called in two places:
(1) shadowing guest GGTT entry
(2) shadowing guest PPGTT leaf entry,
which was introduced in commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

However, now it's not necessary to call this interface any more, because
a. GGTT partial write issue has been fixed by
   commit bc0686ff5fad
   ("drm/i915/gvt: support inconsecutive partial gtt entry write")
   commit 510fe10b6180
   ("drm/i915/gvt: fix a bug of partially write ggtt enties")
b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
   as valid page table writes. Any invalid GPA found is regarded as
   an error, either due to guest misbehavior/attack or bug in host
   shadow code.
   So,rather than do GFN pre-checking and replace invalid GFNs with
   scratch GFN and continue silently, just remove the pre-checking and
   abort PPGTT shadowing on error detected.
c. GFN validity check is still performed in
   intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
   It's more desirable to call VFIO interface to do both validity check
   and mapping.
   Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
   while later mapping the GFN through VFIO interface is unnecessarily
   fragile and confusing for unaware readers.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: remove now-unused local variables]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
 1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 58b9b316ae46..f30922c55a0c 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,22 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;
 
-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-	struct kvm *kvm = vgpu->vfio_device.kvm;
-	int idx;
-	bool ret;
-
-	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
-		return false;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = kvm_is_visible_gfn(kvm, gfn);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return ret;
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address
@@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 {
 	struct intel_vgpu *vgpu = spt->vgpu;
-	struct intel_gvt *gvt = vgpu->gvt;
-	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
 	struct intel_vgpu_ppgtt_spt *s;
 	struct intel_gvt_gtt_entry se, ge;
-	unsigned long gfn, i;
+	unsigned long i;
 	int ret;
 
 	trace_spt_change(spt->vgpu->id, "born", spt,
@@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
 			ppgtt_generate_shadow_entry(&se, s, &ge);
 			ppgtt_set_shadow_entry(spt, &se, i);
 		} else {
-			gfn = ops->get_pfn(&ge);
-			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-				ppgtt_set_shadow_entry(spt, &se, i);
-				continue;
-			}
-
 			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
 			if (ret)
 				goto fail;
@@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		m.val64 = e.val64;
 		m.type = e.type;
 
-		/* one PTE update may be issued in multiple writes and the
-		 * first write may not construct a valid gfn
-		 */
-		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-			goto out;
-		}
-
 		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
 						   &dma_addr);
 		if (ret) {
@@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 		ops->clear_present(&m);
 	}
 
-out:
 	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
 
 	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-14  3:09   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Honor KVM's max allowed page size when determining whether or not a 2MiB
GTT shadow page can be created for the guest.  Querying KVM's max allowed
size is somewhat odd as there's no strict requirement that KVM's memslots
and VFIO's mappings are configured with the same gfn=>hva mapping, but
the check will be accurate if userspace wants to have a functional guest,
and at the very least checking KVM's memslots guarantees that the entire
2MiB range has been exposed to the guest.

Note, KVM may also restrict the mapping size for reasons that aren't
relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
is write-tracked (KVM's write-tracking only handles writes from vCPUs).
However, such scenarios are unlikely to occur with a well-behaved guest,
and at worst will result in sub-optimal performance.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++++++++
 drivers/gpu/drm/i915/gvt/gtt.c        | 10 +++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..3f72c7a172fc 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -51,6 +51,8 @@ void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level);
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
 int kvm_page_track_create_memslot(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 0a2ac438d647..e739dcc3375c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -301,3 +301,21 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
+		max_level = PG_LEVEL_4K;
+	else
+		max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return max_level;
+}
+EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f30922c55a0c..d59c7ab9d224 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1157,14 +1157,22 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
+	unsigned long gfn = ops->get_pfn(entry);
 	kvm_pfn_t pfn;
+	int max_level;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
 		return -EINVAL;
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
+
+	max_level = kvm_page_track_max_mapping_level(vgpu->vfio_device.kvm,
+						     gfn, PG_LEVEL_2M);
+	if (max_level < PG_LEVEL_2M)
+		return 0;
+
+	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  5:33   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When shadowing a GTT entry with a 2M page, explicitly verify that the
first page pinned by VFIO is a transparent hugepage instead of assuming
that page observed by is_2MB_gtt_possible() is the same page pinned by
vfio_pin_pages().  E.g. if userspace is doing something funky with the
guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
and vfio_pin_pages().

This is more of a performance optimization than a bug fix as the check
for contiguous struct pages should guard against incorrect mapping (even
though assuming struct pages are virtually contiguous is wrong).

The real motivation for explicitly checking for a transparent hugepage
after pinning is that it will reduce the risk of introducing a bug in a
future fix for a page refcount leak (KVMGT doesn't put the reference
acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
KVM's gfn_to_pfn() altogether.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8ae7039b3683..90997cc385b4 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 			goto err;
 		}
 
-		if (npage == 0)
-			base_page = cur_page;
+		if (npage == 0) {
+			/*
+			 * Bail immediately to avoid unnecessary pinning when
+			 * trying to shadow a 2M page and the host page isn't
+			 * a transparent hugepage.
+			 *
+			 * TODO: support other type hugepages, e.g. HugeTLB.
+			 */
+			if (size == I915_GTT_PAGE_SIZE_2M &&
+			    !PageTransHuge(cur_page))
+				ret = -EIO;
+			else
+				base_page = cur_page;
+		}
 		else if (base_page + npage != cur_page) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
+		}
+		if (ret < 0) {
 			npage++;
 			goto err;
 		}
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  6:18   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that
the caller is ultimately responsible for dropping any reference.

Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted
struct page before trying to put any references.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index d59c7ab9d224..15848b041a0d 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1160,6 +1160,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	unsigned long gfn = ops->get_pfn(entry);
 	kvm_pfn_t pfn;
 	int max_level;
+	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1179,7 +1180,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	ret = PageTransHuge(pfn_to_page(pfn));
+	kvm_release_pfn_clean(pfn);
+	return ret;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  5:37   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
2M GTT entry is possible and instead just try to map the GFN with a 2MB
entry.  Using KVM to query pfn that is ultimately managed through VFIO is
odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
exported only because of KVM vendor modules (x86 and PPC).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 15848b041a0d..e60bcce241f8 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1146,21 +1146,19 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
 }
 
 /*
- * Check if can do 2M page
+ * Try to map a 2M gtt entry.
  * @vgpu: target vgpu
  * @entry: target pfn's gtt entry
  *
- * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
- * negative if found err.
+ * Return 1 if 2MB huge gtt shadow was creation, 0 if the entry needs to be
+ * split, negative if found err.
  */
-static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
-	struct intel_gvt_gtt_entry *entry)
+static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu,
+	struct intel_gvt_gtt_entry *entry, dma_addr_t *dma_addr)
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
 	unsigned long gfn = ops->get_pfn(entry);
-	kvm_pfn_t pfn;
 	int max_level;
-	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1173,16 +1171,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (max_level < PG_LEVEL_2M)
 		return 0;
 
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
-	if (is_error_noslot_pfn(pfn))
-		return -EINVAL;
-
-	if (!pfn_valid(pfn))
-		return -EINVAL;
-
-	ret = PageTransHuge(pfn_to_page(pfn));
-	kvm_release_pfn_clean(pfn);
-	return ret;
+	return intel_gvt_dma_map_guest_page(vgpu, gfn, I915_GTT_PAGE_SIZE_2M, dma_addr);
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
@@ -1278,7 +1267,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
 	struct intel_gvt_gtt_entry se = *ge;
-	unsigned long gfn, page_size = PAGE_SIZE;
+	unsigned long gfn;
 	dma_addr_t dma_addr;
 	int ret;
 
@@ -1301,13 +1290,12 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return split_64KB_gtt_entry(vgpu, spt, index, &se);
 	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
 		gvt_vdbg_mm("shadow 2M gtt entry\n");
-		ret = is_2MB_gtt_possible(vgpu, ge);
+		ret = try_map_2MB_gtt_entry(vgpu, ge, &dma_addr);
 		if (ret == 0)
 			return split_2MB_gtt_entry(vgpu, spt, index, &se);
 		else if (ret < 0)
 			return ret;
-		page_size = I915_GTT_PAGE_SIZE_2M;
-		break;
+		goto set_shadow_entry;
 	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
 		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
 		return -EINVAL;
@@ -1316,10 +1304,11 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	}
 
 	/* direct shadow */
-	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
+	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
 	if (ret)
 		return -ENXIO;
 
+set_shadow_entry:
 	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
 	ppgtt_set_shadow_entry(spt, &se, index);
 	return 0;
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  6:19   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt() Sean Christopherson
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Use an "unsigned long" instead of an "int" when iterating over the gfns
in a memslot.  The number of pages in the memslot is tracked as an
"unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
larger than 16TiB were deleted (2^32 * 4KiB).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 90997cc385b4..68be66395598 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1634,7 +1634,7 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node)
 {
-	int i;
+	unsigned long i;
 	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  6:20   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock Sean Christopherson
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop intel_vgpu_reset_gtt() as it no longer has any callers.  In addition
to eliminating dead code, this eliminates the last possible scenario where
__kvmgt_protect_table_find() can be reached without holding vgpu_lock.
Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find()
will allow a protecting the gfn hash with vgpu_lock without too much fuss.

No functional change intended.

Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------
 drivers/gpu/drm/i915/gvt/gtt.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index e60bcce241f8..293bb2292021 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2845,24 +2845,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old)
 	ggtt_invalidate(gvt->gt);
 }
 
-/**
- * intel_vgpu_reset_gtt - reset the all GTT related status
- * @vgpu: a vGPU
- *
- * This function is called from vfio core to reset reset all
- * GTT related status, including GGTT, PPGTT, scratch page.
- *
- */
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu)
-{
-	/* Shadow pages are only created when there is no page
-	 * table tracking data, so remove page tracking data after
-	 * removing the shadow pages.
-	 */
-	intel_vgpu_destroy_all_ppgtt_mm(vgpu);
-	intel_vgpu_reset_ggtt(vgpu, true);
-}
-
 /**
  * intel_gvt_restore_ggtt - restore all vGPU's ggtt entries
  * @gvt: intel gvt device
diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
index a3b0f59ec8bd..4cb183e06e95 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.h
+++ b/drivers/gpu/drm/i915/gvt/gtt.h
@@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old);
 void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);
 
 int intel_gvt_init_gtt(struct intel_gvt *gvt);
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu);
 void intel_gvt_clean_gtt(struct intel_gvt *gvt);
 
 struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt() Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  6:21   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Use vgpu_lock instead of KVM's mmu_lock to protect accesses to the hash
table used to track which gfns are write-protected when shadowing the
guest's GTT, and hoist the acquisition of vgpu_lock from
intel_vgpu_page_track_handler() out to its sole caller,
kvmgt_page_track_write().

This fixes a bug where kvmgt_page_track_write(), which doesn't hold
kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
a use-after-free.

Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
sleep when acquiring vgpu->cache_lock deep down the callstack:

  intel_vgpu_page_track_handler()
  |
  |->  page_track->handler / ppgtt_write_protection_handler()
       |
       |-> ppgtt_handle_guest_write_page_table_bytes()
           |
           |->  ppgtt_handle_guest_write_page_table()
                |
                |-> ppgtt_handle_guest_entry_removal()
                    |
                    |-> ppgtt_invalidate_pte()
                        |
                        |-> intel_gvt_dma_unmap_guest_page()
                            |
                            |-> mutex_lock(&vgpu->cache_lock);

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 55 +++++++++++++++------------
 drivers/gpu/drm/i915/gvt/page_track.c | 10 +----
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 68be66395598..9824d075562e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -366,6 +366,8 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
 {
 	struct kvmgt_pgfn *p, *res = NULL;
 
+	lockdep_assert_held(&info->vgpu_lock);
+
 	hash_for_each_possible(info->ptable, p, hnode, gfn) {
 		if (gfn == p->gfn) {
 			res = p;
@@ -1567,6 +1569,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
+	if (kvmgt_gfn_is_write_protected(info, gfn))
+		return 0;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	slot = gfn_to_memslot(kvm, gfn);
 	if (!slot) {
@@ -1575,16 +1580,12 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-
-	if (kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
-
 	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_add(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1597,24 +1598,22 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
+		return 0;
 
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
 	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_del(info, gfn);
-
-out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
 	return 0;
 }
 
@@ -1625,9 +1624,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
+	mutex_lock(&info->vgpu_lock);
+
 	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
+
+	mutex_unlock(&info->vgpu_lock);
 }
 
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
@@ -1639,16 +1642,20 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
-	write_lock(&kvm->mmu_lock);
+	mutex_lock(&info->vgpu_lock);
+
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
 		if (kvmgt_gfn_is_write_protected(info, gfn)) {
+			write_lock(&kvm->mmu_lock);
 			kvm_slot_page_track_remove_page(kvm, slot, gfn,
 						KVM_PAGE_TRACK_WRITE);
+			write_unlock(&kvm->mmu_lock);
+
 			kvmgt_protect_table_del(info, gfn);
 		}
 	}
-	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&info->vgpu_lock);
 }
 
 void intel_vgpu_detach_regions(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/gvt/page_track.c b/drivers/gpu/drm/i915/gvt/page_track.c
index df34e73cba41..60a65435556d 100644
--- a/drivers/gpu/drm/i915/gvt/page_track.c
+++ b/drivers/gpu/drm/i915/gvt/page_track.c
@@ -162,13 +162,9 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 	struct intel_vgpu_page_track *page_track;
 	int ret = 0;
 
-	mutex_lock(&vgpu->vgpu_lock);
-
 	page_track = intel_vgpu_find_page_track(vgpu, gpa >> PAGE_SHIFT);
-	if (!page_track) {
-		ret = -ENXIO;
-		goto out;
-	}
+	if (!page_track)
+		return -ENXIO;
 
 	if (unlikely(vgpu->failsafe)) {
 		/* Remove write protection to prevent furture traps. */
@@ -179,7 +175,5 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 			gvt_err("guest page write error, gpa %llx\n", gpa);
 	}
 
-out:
-	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (9 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-15  1:08   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
bouncing through the page-track mechanism.  KVM (unfortunately) needs to
zap and flush all page tables on memslot DELETE/MOVE irrespective of
whether KVM is shadowing guest page tables.

This will allow changing KVM to register a page-track notifier on the
first shadow root allocation, and will also allow deleting the misguided
kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
different method for reacting to memslot changes.

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 10 +---------
 arch/x86/kvm/x86.c              |  2 ++
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 808c292ad3f4..17281d6825c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1844,6 +1844,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_mmu_zap_all_fast(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4685c80e441b..409dabec69df 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6030,7 +6030,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
  * not use any resource of the being-deleted slot or all slots
  * after calling the function.
  */
-static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
 	lockdep_assert_held(&kvm->slots_lock);
 
@@ -6086,13 +6086,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
-			struct kvm_memory_slot *slot,
-			struct kvm_page_track_notifier_node *node)
-{
-	kvm_mmu_zap_all_fast(kvm);
-}
-
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
@@ -6110,7 +6103,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	}
 
 	node->track_write = kvm_mmu_pte_write;
-	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f706621c35b8..29dd6c97d145 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12662,6 +12662,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
+	kvm_mmu_zap_all_fast(kvm);
+
 	kvm_page_track_flush_slot(kvm, slot);
 }
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (10 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  6:37   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Don't use the generic page-track mechanism to handle writes to guest PTEs
in KVM's MMU.  KVM's MMU needs access to information that should not be
exposed to external page-track users, e.g. KVM needs (for some definitions
of "need") the vCPU to query the current paging mode, whereas external
users, i.e. KVMGT, have no ties to the current vCPU and so should never
need the vCPU.

Moving away from the page-track mechanism will allow dropping use of the
page-track mechanism for KVM's own MMU, and will also allow simplifying
and cleaning up the page-track APIs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 13 ++-----------
 arch/x86/kvm/mmu/page_track.c   |  2 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 17281d6825c9..1a4225237564 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1265,7 +1265,6 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
-	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..b8bde42f6037 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -119,6 +119,8 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes);
 
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 409dabec69df..4f2f83d8322e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5603,9 +5603,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-			      const u8 *new, int bytes,
-			      struct kvm_page_track_notifier_node *node)
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -6088,7 +6087,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 	int r;
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
@@ -6102,9 +6100,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_mmu_pte_write;
-	kvm_page_track_register_notifier(kvm, node);
-
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
 
@@ -6125,10 +6120,6 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
-
-	kvm_page_track_unregister_notifier(kvm, node);
-
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index e739dcc3375c..f39f190ad4ae 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -274,6 +274,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(vcpu, gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (11 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  7:28   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  5 ++---
 arch/x86/kvm/mmu/page_track.c         |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 10 ++++------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 3f72c7a172fc..0d65ae203fd6 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
 	 * It is called when guest is writing the write-tracked page
 	 * and write emulation is finished at that time.
 	 *
-	 * @vcpu: the vcpu where the write access happened.
 	 * @gpa: the physical address written by guest.
 	 * @new: the data was written to the address.
 	 * @bytes: the written length.
 	 * @node: this node
 	 */
-	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			    int bytes, struct kvm_page_track_notifier_node *node);
+	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
+			    struct kvm_page_track_notifier_node *node);
 	/*
 	 * It is called when memory slot is being moved or removed
 	 * users can drop write-protection for the pages in that memory slot
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index f39f190ad4ae..39a0863af8b4 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -272,7 +272,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
 				srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
-			n->track_write(vcpu, gpa, new, bytes, n);
+			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 9824d075562e..292750dc819f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -106,9 +106,8 @@ struct gvt_dma {
 #define vfio_dev_to_vgpu(vfio_dev) \
 	container_of((vfio_dev), struct intel_vgpu, vfio_device)
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node);
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node);
@@ -1617,9 +1616,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	return 0;
 }
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node)
 {
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (12 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-15  8:03   ` Yan Zhao
  2023-03-17  7:29   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
                   ` (13 subsequent siblings)
  27 siblings, 2 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disallow moving memslots if the VM has external page-track users, i.e. if
KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
correctly handle moving memory regions.

Note, this is potential ABI breakage!  E.g. userspace could move regions
that aren't shadowed by KVMGT without harming the guest.  However, the
only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
regions.  KVM's own support for moving memory regions was also broken for
multiple years (albeit for an edge case, but arguably moving RAM is
itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
new rmap and large page tracking when moving memslot").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 3 +++
 arch/x86/kvm/mmu/page_track.c         | 5 +++++
 arch/x86/kvm/x86.c                    | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 0d65ae203fd6..6a287bcbe8a9 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 39a0863af8b4..1cfc0a0ccc23 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -321,3 +321,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29dd6c97d145..47ac9291cd43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * KVM doesn't support moving memslots when there are external page
+	 * trackers attached to the VM, i.e. if KVMGT is in use.
+	 */
+	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
+		return -EINVAL;
+
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
 		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
 			return -EINVAL;
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (13 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  7:30   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When handling a slot "flush", don't call back into KVM to drop write
protection for gfns in the slot.  Now that KVM rejects attempts to move
memory slots while KVMGT is attached, the only time a slot is "flushed"
is when it's being removed, i.e. the memslot and all its write-tracking
metadata is about to be deleted.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 292750dc819f..577712ea4893 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1644,14 +1644,8 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn)) {
-			write_lock(&kvm->mmu_lock);
-			kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						KVM_PAGE_TRACK_WRITE);
-			write_unlock(&kvm->mmu_lock);
-
+		if (kvmgt_gfn_is_write_protected(info, gfn))
 			kvmgt_protect_table_del(info, gfn);
-		}
 	}
 	mutex_unlock(&info->vgpu_lock);
 }
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (14 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  7:43   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Add a new page-track hook, track_remove_region(), that is called when a
memslot DELETE operation is about to be committed.  The "remove" hook
will be used by KVMGT and will effectively replace the existing
track_flush_slot() altogether now that KVM itself doesn't rely on the
"flush" hook either.

The "flush" hook is flawed as it's invoked before the memslot operation
is guaranteed to succeed, i.e. KVM might ultimately keep the existing
memslot without notifying external page track users, a.k.a. KVMGT.  In
practice, this can't currently happen on x86, but there are no guarantees
that won't change in the future, not to mention that "flush" does a very
poor job of describing what is happening.

Pass in the gfn+nr_pages instead of the slot itself so external users,
i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
help set the stage for additional cleanups to the page-track APIs.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 12 ++++++++++++
 arch/x86/kvm/mmu/page_track.c         | 23 +++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  3 +++
 3 files changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 6a287bcbe8a9..152c5e7d7868 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,6 +43,17 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    struct kvm_page_track_notifier_node *node);
+
+	/*
+	 * Invoked when a memory region is removed from the guest.  Or in KVM
+	 * terms, when a memslot is deleted.
+	 *
+	 * @gfn:       base gfn of the region being removed
+	 * @nr_pages:  number of pages in the to-be-removed region
+	 * @node:      this node
+	 */
+	void (*track_remove_region)(gfn_t gfn, unsigned long nr_pages,
+				    struct kvm_page_track_notifier_node *node);
 };
 
 int kvm_page_track_init(struct kvm *kvm);
@@ -77,6 +88,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 1cfc0a0ccc23..d4a8a995276a 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -304,6 +304,29 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * Notify external page track nodes that a memory region is being removed from
+ * the VM, e.g. so that users can free any associated metadata.
+ */
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
+				srcu_read_lock_held(&head->track_srcu))
+		if (n->track_remove_region)
+			n->track_remove_region(slot->base_gfn, slot->npages, n);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
+
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47ac9291cd43..0da5ff007d20 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12645,6 +12645,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
+	if (change == KVM_MR_DELETE)
+		kvm_page_track_delete_slot(kvm, old);
+
 	if (!kvm->arch.n_requested_mmu_pages &&
 	    (change == KVM_MR_CREATE || change == KVM_MR_DELETE)) {
 		unsigned long nr_mmu_pages;
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (15 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  7:45   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 18/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Switch from the poorly named and flawed ->track_flush_slot() to the newly
introduced ->track_remove_region().  From KVMGT's perspective, the two
hooks are functionally equivalent, the only difference being that
->track_remove_region() is called only when KVM is 100% certain the
memory region will be removed, i.e. is invoked slightly later in KVM's
memslot modification flow.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: handle name change, massage changelog, rebase]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 577712ea4893..9f188b6c3edf 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -108,9 +108,8 @@ struct gvt_dma {
 
 static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 				   struct kvm_page_track_notifier_node *node);
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node);
 
 static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
 {
@@ -680,7 +679,7 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
-	vgpu->track_node.track_flush_slot = kvmgt_page_track_flush_slot;
+	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
 	kvm_get_kvm(vgpu->vfio_device.kvm);
 	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
 					 &vgpu->track_node);
@@ -1631,22 +1630,20 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 	mutex_unlock(&info->vgpu_lock);
 }
 
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node)
 {
 	unsigned long i;
-	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
 	mutex_lock(&info->vgpu_lock);
 
-	for (i = 0; i < slot->npages; i++) {
-		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn))
-			kvmgt_protect_table_del(info, gfn);
+	for (i = 0; i < nr_pages; i++) {
+		if (kvmgt_gfn_is_write_protected(info, gfn + i))
+			kvmgt_protect_table_del(info, gfn + i);
 	}
+
 	mutex_unlock(&info->vgpu_lock);
 }
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 18/27] KVM: x86: Remove the unused page-track hook track_flush_slot()
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (16 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Remove ->track_remove_slot(), there are no longer any users and it's
unlikely a "flush" hook will ever be the correct API to provide to an
external page-track user.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 -----------
 arch/x86/kvm/mmu/page_track.c         | 26 --------------------------
 arch/x86/kvm/x86.c                    |  2 --
 3 files changed, 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 152c5e7d7868..e5eb98ca4fce 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -33,16 +33,6 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
 			    struct kvm_page_track_notifier_node *node);
-	/*
-	 * It is called when memory slot is being moved or removed
-	 * users can drop write-protection for the pages in that memory slot
-	 *
-	 * @kvm: the kvm where memory slot being moved or removed
-	 * @slot: the memory slot being moved or removed
-	 * @node: this node
-	 */
-	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    struct kvm_page_track_notifier_node *node);
 
 	/*
 	 * Invoked when a memory region is removed from the guest.  Or in KVM
@@ -87,7 +77,6 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d4a8a995276a..907ab8abb452 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -278,32 +278,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
-/*
- * Notify the node that memory slot is being removed or moved so that it can
- * drop write-protection for the pages in the memory slot.
- *
- * The node should figure out it has any write-protected pages in this slot
- * by itself.
- */
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
-	struct kvm_page_track_notifier_head *head;
-	struct kvm_page_track_notifier_node *n;
-	int idx;
-
-	head = &kvm->arch.track_notifier_head;
-
-	if (hlist_empty(&head->track_notifier_list))
-		return;
-
-	idx = srcu_read_lock(&head->track_srcu);
-	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
-		if (n->track_flush_slot)
-			n->track_flush_slot(kvm, slot, n);
-	srcu_read_unlock(&head->track_srcu, idx);
-}
-
 /*
  * Notify external page track nodes that a memory region is being removed from
  * the VM, e.g. so that users can free any associated metadata.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0da5ff007d20..59b02650cefc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12673,8 +12673,6 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
 	kvm_mmu_zap_all_fast(kvm);
-
-	kvm_page_track_flush_slot(kvm, slot);
 }
 
 static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (17 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 18/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-15  8:44   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bury the declaration of the page-track helpers that are intended only for
internal KVM use in a "private" header.  In addition to guarding against
unwanted usage of the internal-only helpers, dropping their definitions
avoids exposing other structures that should be KVM-internal, e.g. for
memslots.  This is a baby step toward making kvm_host.h a KVM-internal
header in the very distant future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 26 ++++-----------------
 arch/x86/kvm/mmu/mmu.c                |  3 ++-
 arch/x86/kvm/mmu/page_track.c         |  8 +------
 arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  1 +
 5 files changed, 42 insertions(+), 29 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index e5eb98ca4fce..deece45936a5 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_KVM_PAGE_TRACK_H
 #define _ASM_X86_KVM_PAGE_TRACK_H
 
+#include <linux/kvm_types.h>
+
 enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_WRITE,
 	KVM_PAGE_TRACK_MAX,
@@ -46,28 +48,15 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
-
-bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
-enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
-					       enum pg_level max_level);
-
-void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
-int kvm_page_track_create_memslot(struct kvm *kvm,
-				  struct kvm_memory_slot *slot,
-				  unsigned long npages);
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
 				  struct kvm_memory_slot *slot, gfn_t gfn,
 				  enum kvm_page_track_mode mode);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level);
 
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
@@ -75,10 +64,5 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
-void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm);
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4f2f83d8322e..e192968340bf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -25,6 +25,7 @@
 #include "kvm_cache_regs.h"
 #include "smm.h"
 #include "kvm_emulate.h"
+#include "page_track.h"
 #include "cpuid.h"
 #include "spte.h"
 
@@ -53,7 +54,7 @@
 #include <asm/io.h>
 #include <asm/set_memory.h>
 #include <asm/vmx.h>
-#include <asm/kvm_page_track.h>
+
 #include "trace.h"
 
 extern bool itlb_multihit_kvm_mitigation;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 907ab8abb452..a21200df515d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -15,10 +15,9 @@
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
-#include <asm/kvm_page_track.h>
-
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "page_track.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -318,8 +317,3 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm)
-{
-	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
-}
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
new file mode 100644
index 000000000000..89712f123ad3
--- /dev/null
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_PAGE_TRACK_H
+#define __KVM_X86_PAGE_TRACK_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_page_track.h>
+
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
+int kvm_page_track_create_memslot(struct kvm *kvm,
+				  struct kvm_memory_slot *slot,
+				  unsigned long npages);
+
+bool kvm_slot_page_track_is_active(struct kvm *kvm,
+				   const struct kvm_memory_slot *slot,
+				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
+
+#endif /* __KVM_X86_PAGE_TRACK_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59b02650cefc..ba61e51c05ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -25,6 +25,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
+#include "mmu/page_track.h"
 #include "x86.h"
 #include "cpuid.h"
 #include "pmu.h"
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (18 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-15  9:34   ` Yan Zhao
  2023-03-15 10:36   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 21/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
                   ` (7 subsequent siblings)
  27 siblings, 2 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disable the page-track notifier code at compile time if there are no
external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
now hooks emulated writes directly instead of relying on the page-track
mechanism.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu/page_track.c         |  9 ++++-----
 arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1a4225237564..a3423711e403 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1265,7 +1265,9 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 	struct kvm_page_track_notifier_head track_notifier_head;
+#endif
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
 	 * faults only take mmu_lock for read.  For simplicity, the unsync
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index deece45936a5..53c2adb25a07 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -55,6 +55,7 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
 
@@ -64,5 +65,6 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index a21200df515d..619ec8e5fd32 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -194,6 +194,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
 }
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 void kvm_page_track_cleanup(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_head *head;
@@ -255,14 +256,13 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
  * The node should figure out if the written page is the one that node is
  * interested in by itself.
  */
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes)
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
 {
 	struct kvm_page_track_notifier_head *head;
 	struct kvm_page_track_notifier_node *n;
 	int idx;
 
-	head = &vcpu->kvm->arch.track_notifier_head;
+	head = &kvm->arch.track_notifier_head;
 
 	if (hlist_empty(&head->track_notifier_list))
 		return;
@@ -273,8 +273,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
-
-	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
@@ -317,3 +315,4 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+#endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 89712f123ad3..931b26b8fc8f 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -6,8 +6,6 @@
 
 #include <asm/kvm_page_track.h>
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
@@ -21,13 +19,36 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
 }
+#else
+static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
+static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
+
+static inline void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa,
+					  const u8 *new, int bytes) { }
+static inline void kvm_page_track_delete_slot(struct kvm *kvm,
+					      struct kvm_memory_slot *slot) { }
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
+
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
+
+static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+					const u8 *new, int bytes)
+{
+	__kvm_page_track_write(vcpu->kvm, gpa, new, bytes);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
+}
 
 #endif /* __KVM_X86_PAGE_TRACK_H */
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 21/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (19 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 22/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective).  In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient.  And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  12 +--
 arch/x86/include/asm/kvm_page_track.h |  11 +--
 arch/x86/kvm/mmu/mmu.c                |  14 ++--
 arch/x86/kvm/mmu/page_track.c         | 111 ++++++++------------------
 arch/x86/kvm/mmu/page_track.h         |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      |   4 +-
 6 files changed, 51 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3423711e403..23567b851864 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,13 +288,13 @@ struct kvm_kernel_irq_routing_entry;
  * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
  * also includes TDP pages) to determine whether or not a page can be used in
  * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
+ * minimize the size of kvm_memory_slot.arch.gfn_write_track, i.e. allows
+ * allocating 2 bytes per gfn instead of 4 bytes per gfn.
  *
  * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
+ * gfn_write_track.  As above, gfn_write_track is a 16 bit counter, so KVM must
+ * not create more than 2^16-1 upper-level shadow pages at a single gfn,
+ * otherwise gfn_write_track will overflow and explosions will ensue.
  *
  * A unique shadow page (SP) for a gfn is created if and only if an existing SP
  * cannot be reused.  The ability to reuse a SP is tracked by its role, which
@@ -1023,7 +1023,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
-	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
+	unsigned short *gfn_write_track;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 53c2adb25a07..42a4ae451d36 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,11 +4,6 @@
 
 #include <linux/kvm_types.h>
 
-enum kvm_page_track_mode {
-	KVM_PAGE_TRACK_WRITE,
-	KVM_PAGE_TRACK_MAX,
-};
-
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
  * the head which will be notified when guest is triggering the track event.
@@ -49,11 +44,9 @@ struct kvm_page_track_notifier_node {
 };
 
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
+				  struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
+				     struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e192968340bf..7f21a1705438 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -820,8 +820,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn,
-						    KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_add_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -867,8 +866,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						       KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2747,7 +2745,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4155,7 +4153,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
@@ -5387,8 +5385,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * physical address properties) in a single VM would require tracking
 	 * all relevant CPUID information in kvm_mmu_page_role. That is very
 	 * undesirable as it would increase the memory requirements for
-	 * gfn_track (see struct kvm_mmu_page_role comments).  For now that
-	 * problem is swept under the rug; KVM's CPUID API is horrific and
+	 * gfn_write_track (see struct kvm_mmu_page_role comments).  For now
+	 * that problem is swept under the rug; KVM's CPUID API is horrific and
 	 * it's all but impossible to solve it without introducing a new API.
 	 */
 	vcpu->arch.root_mmu.root_role.word = 0;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 619ec8e5fd32..f8c89110f896 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -27,76 +27,50 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
-	int i;
+	kvfree(slot->arch.gfn_write_track);
+	slot->arch.gfn_write_track = NULL;
+}
 
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		kvfree(slot->arch.gfn_track[i]);
-		slot->arch.gfn_track[i] = NULL;
-	}
+static int __kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot,
+						 unsigned long npages)
+{
+	const size_t size = sizeof(*slot->arch.gfn_write_track);
+
+	if (!slot->arch.gfn_write_track)
+		slot->arch.gfn_write_track = __vcalloc(npages, size,
+						       GFP_KERNEL_ACCOUNT);
+
+	return slot->arch.gfn_write_track ? 0 : -ENOMEM;
 }
 
 int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages)
 {
-	int i;
-
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		if (i == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm))
-			continue;
-
-		slot->arch.gfn_track[i] =
-			__vcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
-				  GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.gfn_track[i])
-			goto track_free;
-	}
-
-	return 0;
-
-track_free:
-	kvm_page_track_free_memslot(slot);
-	return -ENOMEM;
-}
-
-static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
-{
-	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
-		return false;
-
-	return true;
-}
-
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
-{
-	unsigned short *gfn_track;
-
-	if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE])
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return 0;
 
-	gfn_track = __vcalloc(slot->npages, sizeof(*gfn_track),
-			      GFP_KERNEL_ACCOUNT);
-	if (gfn_track == NULL)
-		return -ENOMEM;
+	return __kvm_page_track_write_tracking_alloc(slot, npages);
+}
 
-	slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track;
-	return 0;
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
+{
+	return __kvm_page_track_write_tracking_alloc(slot, slot->npages);
 }
 
-static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
-			     enum kvm_page_track_mode mode, short count)
+static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
+				   short count)
 {
 	int index, val;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
 
-	val = slot->arch.gfn_track[mode][index];
+	val = slot->arch.gfn_write_track[index];
 
 	if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
 		return;
 
-	slot->arch.gfn_track[mode][index] += count;
+	slot->arch.gfn_write_track[index] += count;
 }
 
 /*
@@ -109,21 +83,15 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode)
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, 1);
+	update_gfn_write_track(slot, gfn, 1);
 
 	/*
 	 * new track stops large page mapping for the
@@ -131,9 +99,8 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
-	if (mode == KVM_PAGE_TRACK_WRITE)
-		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-			kvm_flush_remote_tlbs(kvm);
+	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+		kvm_flush_remote_tlbs(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
 
@@ -148,20 +115,14 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode)
+				     struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, -1);
+	update_gfn_write_track(slot, gfn, -1);
 
 	/*
 	 * allow large page mapping for the tracked page
@@ -176,22 +137,18 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
  */
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode)
+				   gfn_t gfn)
 {
 	int index;
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
-		return false;
-
 	if (!slot)
 		return false;
 
-	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return false;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	return !!READ_ONCE(slot->arch.gfn_write_track[index]);
 }
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 931b26b8fc8f..789d0c479519 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -16,8 +16,7 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  unsigned long npages);
 
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+				   const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 9f188b6c3edf..1e0f4ec55782 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1578,7 +1578,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_add_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1607,7 +1607,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_remove_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 22/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (20 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 21/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 ++++----
 arch/x86/kvm/mmu/mmu.c                |  8 ++++----
 arch/x86/kvm/mmu/page_track.c         | 21 +++++++++------------
 arch/x86/kvm/mmu/page_track.h         |  4 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c      |  4 ++--
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 42a4ae451d36..20055064793a 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,10 +43,10 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_add_gfn(struct kvm *kvm,
+			     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7f21a1705438..3d1aad44c2ec 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -820,7 +820,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn);
+		return kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -866,7 +866,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
+		return kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2745,7 +2745,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
+	if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4153,7 +4153,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
+	if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index f8c89110f896..1993db4578e5 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -84,10 +84,9 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn)
 {
-
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -102,12 +101,11 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
 /*
  * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page. It is the opposed operation of
- * kvm_slot_page_track_add_page().
+ * of corresponding access on that page.
  *
  * It should be called under the protection both of mmu-lock and kvm->srcu
  * or kvm->slots_lock.
@@ -116,8 +114,8 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_remove_gfn(struct kvm *kvm,
+				struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
@@ -130,14 +128,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn)
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	int index;
 
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 789d0c479519..50d3278e8c69 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,8 +15,8 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 1e0f4ec55782..e5a18d92030b 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1578,7 +1578,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn);
+	kvm_write_track_add_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1607,7 +1607,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn);
+	kvm_write_track_remove_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (21 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 22/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  7:55   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 24/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When adding/removing gfns to/from write-tracking, assert that mmu_lock
is held for write, and that either slots_lock or kvm->srcu is held.
mmu_lock must be held for write to protect gfn_write_track's refcount,
and SRCU or slots_lock must be held to protect the memslot itself.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 1993db4578e5..ffcd7ac66f9e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -12,6 +12,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/lockdep.h>
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
@@ -77,9 +78,6 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * add guest page to the tracking pool so that corresponding access on that
  * page will be intercepted.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -87,6 +85,11 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			     gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -107,9 +110,6 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
  * remove the guest page from the tracking pool which stops the interception
  * of corresponding access on that page.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -117,6 +117,11 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 void kvm_write_track_remove_gfn(struct kvm *kvm,
 				struct kvm_memory_slot *slot, gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 24/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (22 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-11  0:22 ` [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bug the VM if something attempts to write-track a gfn, but write-tracking
isn't enabled.  The VM is doomed (and KVM has an egregious bug) if KVM or
KVMGT wants to shadow guest page tables but can't because write-tracking
isn't enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index ffcd7ac66f9e..327e73be62d6 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -90,7 +90,7 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, 1);
@@ -122,7 +122,7 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, -1);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (23 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 24/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  8:28   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 +--
 arch/x86/kvm/mmu/mmu.c                |  4 +-
 arch/x86/kvm/mmu/page_track.c         | 86 ++++++++++++++++++++-------
 arch/x86/kvm/mmu/page_track.h         |  5 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 37 +++---------
 5 files changed, 82 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 20055064793a..415537ce45b4 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,11 +43,6 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_write_track_add_gfn(struct kvm *kvm,
-			     struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-				gfn_t gfn);
-
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
@@ -58,6 +53,9 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3d1aad44c2ec..cf59b44de912 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -820,7 +820,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_add_gfn(kvm, slot, gfn);
+		return __kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -866,7 +866,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_remove_gfn(kvm, slot, gfn);
+		return __kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 327e73be62d6..69b6431b394b 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -74,16 +74,8 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 	slot->arch.gfn_write_track[index] += count;
 }
 
-/*
- * add guest page to the tracking pool so that corresponding access on that
- * page will be intercepted.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-			     gfn_t gfn)
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -104,18 +96,9 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
-/*
- * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_remove_gfn(struct kvm *kvm,
-				struct kvm_memory_slot *slot, gfn_t gfn)
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -133,7 +116,6 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
@@ -274,4 +256,64 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_add_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_remove_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 50d3278e8c69..62f98c6c5af3 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,6 +15,11 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn);
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn);
+
 bool kvm_gfn_is_write_tracked(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e5a18d92030b..898f1f1d308d 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1560,9 +1560,7 @@ static struct mdev_driver intel_vgpu_mdev_driver = {
 
 int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1570,18 +1568,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_add_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_add(info, gfn);
 	return 0;
@@ -1589,9 +1578,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 
 int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx;
+	int r;
 
 	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
 		return -ESRCH;
@@ -1599,17 +1586,9 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_remove_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
+	r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+	if (r)
+		return r;
 
 	kvmgt_protect_table_del(info, gfn);
 	return 0;
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (24 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  8:52   ` Yan Zhao
  2023-03-11  0:22 ` [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
  2023-03-13  9:58 ` [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so.  Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 10 ++++------
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++------
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 17 +++++++----------
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 415537ce45b4..66a0d7c34311 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -47,12 +47,10 @@ struct kvm_page_track_notifier_node {
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
 
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n);
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n);
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n);
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n);
 
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 69b6431b394b..6ca644d3c926 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -157,17 +157,22 @@ int kvm_page_track_init(struct kvm *kvm)
  * register the notifier so that event interception for the tracked guest
  * pages can be received.
  */
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n)
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
+	if (!kvm || kvm->mm != current->mm)
+		return -ESRCH;
+
+	kvm_get_kvm(kvm);
+
 	head = &kvm->arch.track_notifier_head;
 
 	write_lock(&kvm->mmu_lock);
 	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
 	write_unlock(&kvm->mmu_lock);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
 
@@ -175,9 +180,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
  * stop receiving the event interception. It is the opposed operation of
  * kvm_page_track_register_notifier().
  */
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n)
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
@@ -187,6 +191,8 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 	hlist_del_rcu(&n->node);
 	write_unlock(&kvm->mmu_lock);
 	synchronize_srcu(&head->track_srcu);
+
+	kvm_put_kvm(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 898f1f1d308d..d16aced134b4 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -668,21 +668,19 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
 static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 {
 	struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
-
-	if (!vgpu->vfio_device.kvm ||
-	    vgpu->vfio_device.kvm->mm != current->mm) {
-		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
-		return -ESRCH;
-	}
+	int ret;
 
 	if (__kvmgt_vgpu_exist(vgpu))
 		return -EEXIST;
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
 	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
-	kvm_get_kvm(vgpu->vfio_device.kvm);
-	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
-					 &vgpu->track_node);
+	ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
+					       &vgpu->track_node);
+	if (ret) {
+		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
+		return ret;
+	}
 
 	set_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status);
 
@@ -717,7 +715,6 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
 
 	kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
 					   &vgpu->track_node);
-	kvm_put_kvm(vgpu->vfio_device.kvm);
 
 	kvmgt_protect_table_destroy(vgpu);
 	gvt_cache_destroy(vgpu);
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (25 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
@ 2023-03-11  0:22 ` Sean Christopherson
  2023-03-17  8:58   ` Yan Zhao
  2023-03-13  9:58 ` [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
  27 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-11  0:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h.  KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   | 3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2d65800d8e93..53a0a42a50db 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -34,10 +34,11 @@
 #define _GVT_H_
 
 #include <uapi/linux/pci_regs.h>
-#include <linux/kvm_host.h>
 #include <linux/vfio.h>
 #include <linux/mdev.h>
 
+#include <asm/kvm_page_track.h>
+
 #include "i915_drv.h"
 #include "intel_gvt.h"
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index d16aced134b4..798d04481f03 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1599,7 +1599,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 
 	mutex_lock(&info->vgpu_lock);
 
-	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
+	if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
 
-- 
2.40.0.rc1.284.g88254d51c5-goog


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
  2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (26 preceding siblings ...)
  2023-03-11  0:22 ` [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
@ 2023-03-13  9:58 ` Yan Zhao
  27 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-13  9:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gfx,
	linux-kernel, Ben Gardon, intel-gvt-dev

On Fri, Mar 10, 2023 at 04:22:31PM -0800, Sean Christopherson wrote:
> Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> page-track APIs to provide a leaner and cleaner interface.  The motivation
> for this series is to (significantly) reduce the number of KVM APIs that
> KVMGT uses, with a long-term goal of making all kvm_host.h headers
> KVM-internal.
> 
> As was the case in v1, tThe KVMGT changes are compile tested only.
> 
> Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git next".
> 
> v2:
>  - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
>    buggy) mutext. [Yan]
>  - Remove a spurious return from kvm_page_track_init(). [Yan]
>  - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
>  - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
>    of providing a dedicated interface. [Yan]
> 
> base-commit: 45dd9bc75d9adc9483f0c7d662ba6e73ed698a0b
> -- 
Thanks for the update!
It passed basic tests (gvt in a single vm) at my side.
Will do detailed review tomorrow.

Thanks
Yan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot
  2023-03-11  0:22 ` [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
@ 2023-03-13 15:37   ` Wang, Wei W
  0 siblings, 0 replies; 79+ messages in thread
From: Wang, Wei W @ 2023-03-13 15:37 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang, Wang, Zhi A
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y, Ben Gardon

On Saturday, March 11, 2023 8:23 AM, Sean Christopherson wrote:
> Extract the memslot-related logic of kvm_mmu_max_mapping_level() into a
> new helper so that KVMGT can determine whether or not mapping a 2MiB
> page into the guest is (dis)allowed per KVM's memslots.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++++++------
>  arch/x86/kvm/mmu/mmu_internal.h |  2 ++
>  2 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index
> c8ebe542c565..4685c80e441b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3083,20 +3083,29 @@ static int host_pfn_mapping_level(struct kvm
> *kvm, gfn_t gfn,
>  	return level;
>  }
> 
> +int kvm_mmu_max_slot_mapping_level(const struct kvm_memory_slot *slot,
> +				   gfn_t gfn, int max_level)

It seems more common to be named "kvm_mmu_slot_max_mapping_level"
(we have other kvm_mmu_slot_* functions defined already)


> +{
> +	struct kvm_lpage_info *linfo;
> +
> +	for ( ; max_level > PG_LEVEL_4K; max_level--) {
> +		linfo = lpage_info_slot(gfn, slot, max_level);
> +		if (!linfo->disallow_lpage)
> +			break;
> +	}
> +	return max_level;
> +}
> +
>  int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  			      const struct kvm_memory_slot *slot, gfn_t gfn,
>  			      int max_level)
>  {
> -	struct kvm_lpage_info *linfo;
>  	int host_level;
> 
>  	max_level = min(max_level, max_huge_page_level);

Better to also have this min(,) moved to the helper?
E.g. if max_huge_page_level has been 4KB, no need to check into lpage_info in the helper.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
@ 2023-03-13 15:37   ` Wang, Wei W
  2023-03-15 18:13     ` [Intel-gfx] " Andrzej Hajda
  2023-03-17  4:20   ` Yan Zhao
  1 sibling, 1 reply; 79+ messages in thread
From: Wang, Wei W @ 2023-03-13 15:37 UTC (permalink / raw)
  To: Christopherson,, Sean, Paolo Bonzini, Zhenyu Wang, Wang, Zhi A
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Zhao, Yan Y, Ben Gardon

On Saturday, March 11, 2023 8:23 AM, Sean Christopherson wrote:
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 4ec85308379a..58b9b316ae46 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu
> *vgpu,
>  	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
>  	if (is_error_noslot_pfn(pfn))
>  		return -EINVAL;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +

Merge the two errors in one "if" to have less LOC?
i.e.
if (is_error_noslot_pfn(pfn) || !pfn_valid(pfn))
    return -EINVAL;

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-03-11  0:22 ` [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
@ 2023-03-14  3:09   ` Yan Zhao
  2023-03-14 17:13     ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-14  3:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:35PM -0800, Sean Christopherson wrote:
> Honor KVM's max allowed page size when determining whether or not a 2MiB
> GTT shadow page can be created for the guest.  Querying KVM's max allowed
> size is somewhat odd as there's no strict requirement that KVM's memslots
> and VFIO's mappings are configured with the same gfn=>hva mapping, but
> the check will be accurate if userspace wants to have a functional guest,
> and at the very least checking KVM's memslots guarantees that the entire
> 2MiB range has been exposed to the guest.
>
hi Sean,
I remember in our last discussion, the conclusion was that
we can safely just use VFIO ABI (which is intel_gvt_dma_map_guest_page()
introduced in patch 7) to check max mapping size. [1][2]

"Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
mapping size unnecessarily smaller."
This is especially true when the guest page is page tracked by kvm internally
for nested VMs, but is not page tracked by kvmgt as ppgtt page table pages.
kvmgt is ok to map those pages as huge when 4k is returned in
kvm_page_track_max_mapping_level() for this reason. 


"I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
and permissions, and that the only requirement for KVM memslots is that GTT page
tables need to be visible in KVM's memslots.  But if that's the ABI, then
intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

In other words, pick either VFIO or KVM.  Checking that X is valid according to
KVM and then mapping X through VFIO is confusing and makes assumptions about how
userspace configures KVM and VFIO.  It works because QEMU always configures KVM
and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
unaware readers because the code is technically flawed.
"

[1] https://lore.kernel.org/all/Y7Y+759IN2DH5h3h@yzhao56-desk.sh.intel.com/
[2] https://lore.kernel.org/all/Y7cLkLUMCy+XLRwm@google.com/

> Note, KVM may also restrict the mapping size for reasons that aren't
> relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> is write-tracked (KVM's write-tracking only handles writes from vCPUs).
> However, such scenarios are unlikely to occur with a well-behaved guest,
> and at worst will result in sub-optimal performance.

As being confirmed in [3], there's no risk of iTLB multi-hit even for
not-well-behaved guests if page tables for DMA mappings in IOMMU page tables
are in a separated set of tables from EPT/NPT (which is the by default
condition currently).

[3] https://lore.kernel.org/all/Y7%2FFZpizEyIaL+Su@yzhao56-desk.sh.intel.com/

So, I'm fine with exporting this kvm_page_track_max_mapping_level()
interface, but I don't think KVMGT is a user of it. 
> 
> Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 ++
>  arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++++++++
>  drivers/gpu/drm/i915/gvt/gtt.c        | 10 +++++++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..3f72c7a172fc 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -51,6 +51,8 @@ void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level);
>  
>  void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
>  int kvm_page_track_create_memslot(struct kvm *kvm,
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 0a2ac438d647..e739dcc3375c 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -301,3 +301,21 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  			n->track_flush_slot(kvm, slot, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level)
> +{
> +	struct kvm_memory_slot *slot;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
> +		max_level = PG_LEVEL_4K;
> +	else
> +		max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return max_level;
> +}
> +EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index f30922c55a0c..d59c7ab9d224 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1157,14 +1157,22 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	struct intel_gvt_gtt_entry *entry)
>  {
>  	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> +	unsigned long gfn = ops->get_pfn(entry);
>  	kvm_pfn_t pfn;
> +	int max_level;
>  
>  	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>  		return 0;
>  
>  	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
>  		return -EINVAL;
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> +
> +	max_level = kvm_page_track_max_mapping_level(vgpu->vfio_device.kvm,
> +						     gfn, PG_LEVEL_2M);
> +	if (max_level < PG_LEVEL_2M)
> +		return 0;
> +
> +	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
>  	if (is_error_noslot_pfn(pfn))
>  		return -EINVAL;
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-03-14  3:09   ` Yan Zhao
@ 2023-03-14 17:13     ` Sean Christopherson
  0 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-14 17:13 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Tue, Mar 14, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:35PM -0800, Sean Christopherson wrote:
> > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > size is somewhat odd as there's no strict requirement that KVM's memslots
> > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> > the check will be accurate if userspace wants to have a functional guest,
> > and at the very least checking KVM's memslots guarantees that the entire
> > 2MiB range has been exposed to the guest.
> >
> hi Sean,
> I remember in our last discussion, the conclusion was that
> we can safely just use VFIO ABI (which is intel_gvt_dma_map_guest_page()
> introduced in patch 7) to check max mapping size. [1][2]

Gah, my apologies.  I completely forgot about dropping KVM's mapping size check.
I was pretty sure I was forgetting something, but couldn't figure out what I was
forgetting.  I'll drop this in the next version.

Thanks!

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2023-03-11  0:22 ` [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
@ 2023-03-15  1:08   ` Yan Zhao
  2023-03-15 15:32     ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-15  1:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:42PM -0800, Sean Christopherson wrote:
...
> -static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> -			struct kvm_memory_slot *slot,
> -			struct kvm_page_track_notifier_node *node)
> -{
> -	kvm_mmu_zap_all_fast(kvm);
> -}
> -
>  int kvm_mmu_init_vm(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> @@ -6110,7 +6103,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
>  	}
>  
>  	node->track_write = kvm_mmu_pte_write;
> -	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
>  	kvm_page_track_register_notifier(kvm, node);
>  
>  	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f706621c35b8..29dd6c97d145 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12662,6 +12662,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> +	kvm_mmu_zap_all_fast(kvm);
Could we still call kvm_mmu_invalidate_zap_pages_in_memslot() here?
As I know, for TDX, its version of
kvm_mmu_invalidate_zap_pages_in_memslot() is like

static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
                        struct kvm_memory_slot *slot,
                        struct kvm_page_track_notifier_node *node)
{
        if (kvm_gfn_shared_mask(kvm))
                kvm_mmu_zap_memslot(kvm, slot);
        else
                kvm_mmu_zap_all_fast(kvm);
}

Maybe this kind of judegment is better to be confined in mmu.c?

Thanks
Yan

> +
>  	kvm_page_track_flush_slot(kvm, slot);
>  }
>  

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-03-11  0:22 ` [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
@ 2023-03-15  8:03   ` Yan Zhao
  2023-03-15 15:43     ` Sean Christopherson
  2023-03-17  7:29   ` Yan Zhao
  1 sibling, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-15  8:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> Disallow moving memslots if the VM has external page-track users, i.e. if
> KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> correctly handle moving memory regions.
> 
> Note, this is potential ABI breakage!  E.g. userspace could move regions
> that aren't shadowed by KVMGT without harming the guest.  However, the
> only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> regions.  KVM's own support for moving memory regions was also broken for
> multiple years (albeit for an edge case, but arguably moving RAM is
> itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> new rmap and large page tracking when moving memslot").
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
...
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 29dd6c97d145..47ac9291cd43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  				   struct kvm_memory_slot *new,
>  				   enum kvm_mr_change change)
>  {
> +	/*
> +	 * KVM doesn't support moving memslots when there are external page
> +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> +	 */
> +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> +		return -EINVAL;
Hmm, will page track work correctly on moving memslots when there's no
external users?

in case of KVM_MR_MOVE,
kvm_prepare_memory_region(kvm, old, new, change)
  |->kvm_arch_prepare_memory_region(kvm, old, new, change)
       |->kvm_alloc_memslot_metadata(kvm, new)
            |->memset(&slot->arch, 0, sizeof(slot->arch));
            |->kvm_page_track_create_memslot(kvm, slot, npages)
The new->arch.arch.gfn_write_track will be fresh empty.


kvm_arch_commit_memory_region(kvm, old, new, change);
  |->kvm_arch_free_memslot(kvm, old);
       |->kvm_page_track_free_memslot(slot);
The old->arch.gfn_write_track is freed afterwards.

So, in theory, the new GFNs are not write tracked though the old ones are.

Is that acceptable for the internal page-track user?

>  	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
>  		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
>  			return -EINVAL;
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2023-03-11  0:22 ` [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
@ 2023-03-15  8:44   ` Yan Zhao
  2023-03-15 15:13     ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-15  8:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:50PM -0800, Sean Christopherson wrote:
> Bury the declaration of the page-track helpers that are intended only for
> internal KVM use in a "private" header.  In addition to guarding against
> unwanted usage of the internal-only helpers, dropping their definitions
> avoids exposing other structures that should be KVM-internal, e.g. for
> memslots.  This is a baby step toward making kvm_host.h a KVM-internal
> header in the very distant future.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h | 26 ++++-----------------
>  arch/x86/kvm/mmu/mmu.c                |  3 ++-
>  arch/x86/kvm/mmu/page_track.c         |  8 +------
>  arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  1 +
>  5 files changed, 42 insertions(+), 29 deletions(-)
>  create mode 100644 arch/x86/kvm/mmu/page_track.h
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index e5eb98ca4fce..deece45936a5 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h

A curious question:
are arch/x86/include/asm/kvm_*.h all expected to be external accessible?

Thanks
Yan


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-11  0:22 ` [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
@ 2023-03-15  9:34   ` Yan Zhao
  2023-03-15 16:21     ` Sean Christopherson
  2023-03-15 10:36   ` Yan Zhao
  1 sibling, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-15  9:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Nit: there is a typo in the commit header: "iff" -> "if"

> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes)
> +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
Line length is 81 characters. A little longer than 80 :)

> +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
This line is also too long.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-11  0:22 ` [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
  2023-03-15  9:34   ` Yan Zhao
@ 2023-03-15 10:36   ` Yan Zhao
  2023-03-15 16:54     ` Sean Christopherson
  1 sibling, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-15 10:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:51PM -0800, Sean Christopherson wrote:
> Disable the page-track notifier code at compile time if there are no
> external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> now hooks emulated writes directly instead of relying on the page-track
> mechanism.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  2 ++
>  arch/x86/include/asm/kvm_page_track.h |  2 ++
>  arch/x86/kvm/mmu/page_track.c         |  9 ++++-----
>  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
>  4 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1a4225237564..a3423711e403 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1265,7 +1265,9 @@ struct kvm_arch {
>  	 * create an NX huge page (without hanging the guest).
>  	 */
>  	struct list_head possible_nx_huge_pages;
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  	struct kvm_page_track_notifier_head track_notifier_head;
> +#endif
>  	/*
>  	 * Protects marking pages unsync during page faults, as TDP MMU page
>  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index deece45936a5..53c2adb25a07 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
The "#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING" can be moved to the
front of this file?
All the structures are only exposed for external users now.

> @@ -55,6 +55,7 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  				     struct kvm_memory_slot *slot, gfn_t gfn,
>  				     enum kvm_page_track_mode mode);
>  
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  					       enum pg_level max_level);
>  
> @@ -64,5 +65,6 @@ kvm_page_track_register_notifier(struct kvm *kvm,
>  void
>  kvm_page_track_unregister_notifier(struct kvm *kvm,
>  				   struct kvm_page_track_notifier_node *n);
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
>  
>  #endif

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2023-03-15  8:44   ` Yan Zhao
@ 2023-03-15 15:13     ` Sean Christopherson
  2023-03-16  9:19       ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 15:13 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:50PM -0800, Sean Christopherson wrote:
> > Bury the declaration of the page-track helpers that are intended only for
> > internal KVM use in a "private" header.  In addition to guarding against
> > unwanted usage of the internal-only helpers, dropping their definitions
> > avoids exposing other structures that should be KVM-internal, e.g. for
> > memslots.  This is a baby step toward making kvm_host.h a KVM-internal
> > header in the very distant future.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/kvm_page_track.h | 26 ++++-----------------
> >  arch/x86/kvm/mmu/mmu.c                |  3 ++-
> >  arch/x86/kvm/mmu/page_track.c         |  8 +------
> >  arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c                    |  1 +
> >  5 files changed, 42 insertions(+), 29 deletions(-)
> >  create mode 100644 arch/x86/kvm/mmu/page_track.h
> > 
> > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> > index e5eb98ca4fce..deece45936a5 100644
> > --- a/arch/x86/include/asm/kvm_page_track.h
> > +++ b/arch/x86/include/asm/kvm_page_track.h
> 
> A curious question:
> are arch/x86/include/asm/kvm_*.h all expected to be external accessible?

Depends on what you mean by "expected".  Currently, yes, everything in there is
globally visible.  But the vast majority of structs, defines, functions, etc. aren't
intended for external non-KVM consumption, things ended up being globally visible
largely through carelessness and/or a lack of a forcing function.

E.g. there is absolutely no reason anything outside of KVM should need
arch/x86/include/asm/kvm-x86-ops.h, but it landed in asm/ because, at the time it
was added, nothing would be harmed by making kvm-x86-ops.h "public" and we didn't
scrutinize the patches well enough.

My primary motivation for this series is to (eventually) get to a state where only
select symbols/defines/etc. are exposed by KVM to the outside world, and everything
else is internal only.  The end goal of tightly restricting KVM's global API is to
allow concurrently loading multiple instances of kvm.ko so that userspace can
upgrade/rollback KVM without needed to move VMs off the host, i.e. by performing
intrahost migration between differenate instances of KVM on the same host.  To do
that safely, anything that is visible outside of KVM needs to be compatible across
different instances of KVM, e.g. if kvm_vcpu is "public" then a KVM upgrade/rollback
wouldn't be able to touch "struct kvm_vcpu" in any way.  We'll definitely want to be
able to modify things like the vCPU structures, thus the push to restrict the API.

But even if we never realize that end goal, IMO drastically reducing KVM's "public"
API surface is worthy goal in and of itself.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2023-03-15  1:08   ` Yan Zhao
@ 2023-03-15 15:32     ` Sean Christopherson
  0 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 15:32 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:42PM -0800, Sean Christopherson wrote:
> ...
> > -static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> > -			struct kvm_memory_slot *slot,
> > -			struct kvm_page_track_notifier_node *node)
> > -{
> > -	kvm_mmu_zap_all_fast(kvm);
> > -}
> > -
> >  int kvm_mmu_init_vm(struct kvm *kvm)
> >  {
> >  	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> > @@ -6110,7 +6103,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
> >  	}
> >  
> >  	node->track_write = kvm_mmu_pte_write;
> > -	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
> >  	kvm_page_track_register_notifier(kvm, node);
> >  
> >  	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f706621c35b8..29dd6c97d145 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12662,6 +12662,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> >  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> >  				   struct kvm_memory_slot *slot)
> >  {
> > +	kvm_mmu_zap_all_fast(kvm);
> Could we still call kvm_mmu_invalidate_zap_pages_in_memslot() here?
> As I know, for TDX, its version of
> kvm_mmu_invalidate_zap_pages_in_memslot() is like
> 
> static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
>                         struct kvm_memory_slot *slot,
>                         struct kvm_page_track_notifier_node *node)
> {
>         if (kvm_gfn_shared_mask(kvm))
>                 kvm_mmu_zap_memslot(kvm, slot);
>         else
>                 kvm_mmu_zap_all_fast(kvm);
> }
> 
> Maybe this kind of judegment is better to be confined in mmu.c?

Hmm, yeah, I agree.  The only reason I exposed kvm_mmu_zap_all_fast() is because
kvm_mmu_zap_all() is already exposed for kvm_arch_flush_shadow_all() and it felt
weird/wrong to split those.  But that's the only usage of kvm_mmu_zap_all(), so
a better approach to maintain consistency would be to move
kvm_arch_flush_shadow_{all,memslot}() into mmu.c.  I'll do that in the next version.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-03-15  8:03   ` Yan Zhao
@ 2023-03-15 15:43     ` Sean Christopherson
  2023-03-16  9:27       ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 15:43 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> > Disallow moving memslots if the VM has external page-track users, i.e. if
> > KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> > correctly handle moving memory regions.
> > 
> > Note, this is potential ABI breakage!  E.g. userspace could move regions
> > that aren't shadowed by KVMGT without harming the guest.  However, the
> > only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> > regions.  KVM's own support for moving memory regions was also broken for
> > multiple years (albeit for an edge case, but arguably moving RAM is
> > itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> > new rmap and large page tracking when moving memslot").
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> ...
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 29dd6c97d145..47ac9291cd43 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >  				   struct kvm_memory_slot *new,
> >  				   enum kvm_mr_change change)
> >  {
> > +	/*
> > +	 * KVM doesn't support moving memslots when there are external page
> > +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> > +	 */
> > +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> > +		return -EINVAL;
> Hmm, will page track work correctly on moving memslots when there's no
> external users?
> 
> in case of KVM_MR_MOVE,
> kvm_prepare_memory_region(kvm, old, new, change)
>   |->kvm_arch_prepare_memory_region(kvm, old, new, change)
>        |->kvm_alloc_memslot_metadata(kvm, new)
>             |->memset(&slot->arch, 0, sizeof(slot->arch));
>             |->kvm_page_track_create_memslot(kvm, slot, npages)
> The new->arch.arch.gfn_write_track will be fresh empty.
> 
> 
> kvm_arch_commit_memory_region(kvm, old, new, change);
>   |->kvm_arch_free_memslot(kvm, old);
>        |->kvm_page_track_free_memslot(slot);
> The old->arch.gfn_write_track is freed afterwards.
> 
> So, in theory, the new GFNs are not write tracked though the old ones are.
> 
> Is that acceptable for the internal page-track user?

It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that
KVM loses the write-tracking counts is benign.  I suspect no VMM actually does
does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance
cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it
out isn't worth the pain of having to deal with potential ABI breakage.

Though in hindsight I wish I had tried disallowed moving memslots instead of
fixing the various bugs a few years back. :-(

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-15  9:34   ` Yan Zhao
@ 2023-03-15 16:21     ` Sean Christopherson
  2023-03-16  9:29       ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 16:21 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Yan Zhao wrote:
> Nit: there is a typo in the commit header: "iff" -> "if"
> 
> > -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> > -			  int bytes)
> > +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
> Line length is 81 characters. A little longer than 80 :)
> 
> > +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
> This line is also too long.

The 80 character limit is a "soft" limit these days, e.g. checkpatch only complains
if a line is 100+.  In KVM x86, the preferred style is to treat the 80 char limit
as "firm", for lack of a better word.  E.g. let a line run over if it's just a
char or two and there's no other wrapping in the declaration, but don't create long
lines just because checkpatch no longer yells.

There's obviously a fair bit of subjectivity, but the guideline has worked well
so far (hopefully I didn't just jinx us).

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-15 10:36   ` Yan Zhao
@ 2023-03-15 16:54     ` Sean Christopherson
  2023-05-04 19:54       ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 16:54 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:51PM -0800, Sean Christopherson wrote:
> > Disable the page-track notifier code at compile time if there are no
> > external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> > now hooks emulated writes directly instead of relying on the page-track
> > mechanism.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h       |  2 ++
> >  arch/x86/include/asm/kvm_page_track.h |  2 ++
> >  arch/x86/kvm/mmu/page_track.c         |  9 ++++-----
> >  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
> >  4 files changed, 33 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 1a4225237564..a3423711e403 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1265,7 +1265,9 @@ struct kvm_arch {
> >  	 * create an NX huge page (without hanging the guest).
> >  	 */
> >  	struct list_head possible_nx_huge_pages;
> > +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> >  	struct kvm_page_track_notifier_head track_notifier_head;
> > +#endif
> >  	/*
> >  	 * Protects marking pages unsync during page faults, as TDP MMU page
> >  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> > index deece45936a5..53c2adb25a07 100644
> > --- a/arch/x86/include/asm/kvm_page_track.h
> > +++ b/arch/x86/include/asm/kvm_page_track.h
> The "#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING" can be moved to the
> front of this file?
> All the structures are only exposed for external users now.

Huh.  I've no idea why I didn't do that.  IIRC, the entire reason past me wrapped
track_notifier_head in an #ifdef was to allow this change in kvm_page_track.h.

I'll do this in the next version unless I discover an edge case I'm overlooking.

Thanks yet again!

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-03-13 15:37   ` Wang, Wei W
@ 2023-03-15 18:13     ` Andrzej Hajda
  2023-03-15 19:23       ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Andrzej Hajda @ 2023-03-15 18:13 UTC (permalink / raw)
  To: Wang, Wei W, Christopherson,,
	Sean, Paolo Bonzini, Zhenyu Wang, Wang, Zhi A
  Cc: Zhao, Yan Y, kvm, intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

On 13.03.2023 16:37, Wang, Wei W wrote:
> On Saturday, March 11, 2023 8:23 AM, Sean Christopherson wrote:
>> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
>> index 4ec85308379a..58b9b316ae46 100644
>> --- a/drivers/gpu/drm/i915/gvt/gtt.c
>> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
>> @@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu
>> *vgpu,
>>   	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
>>   	if (is_error_noslot_pfn(pfn))
>>   		return -EINVAL;
>> +
>> +	if (!pfn_valid(pfn))
>> +		return -EINVAL;
>> +
> 
> Merge the two errors in one "if" to have less LOC?
> i.e.
> if (is_error_noslot_pfn(pfn) || !pfn_valid(pfn))
>      return -EINVAL;

you can just replace "if (is_error_noslot_pfn(pfn))" with "if 
(!pfn_valid(pfn))", it covers both cases.

Regards
Andrzej

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-03-15 18:13     ` [Intel-gfx] " Andrzej Hajda
@ 2023-03-15 19:23       ` Sean Christopherson
  0 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-15 19:23 UTC (permalink / raw)
  To: Andrzej Hajda
  Cc: Wei Wang, Paolo Bonzini, Zhenyu Wang, Zhi Wang, Yan Zhao, kvm,
	intel-gfx, linux-kernel, Ben Gardon, intel-gvt-dev

On Wed, Mar 15, 2023, Andrzej Hajda wrote:
> On 13.03.2023 16:37, Wang, Wei W wrote:
> > On Saturday, March 11, 2023 8:23 AM, Sean Christopherson wrote:
> > > diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> > > index 4ec85308379a..58b9b316ae46 100644
> > > --- a/drivers/gpu/drm/i915/gvt/gtt.c
> > > +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> > > @@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu
> > > *vgpu,
> > >   	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> > >   	if (is_error_noslot_pfn(pfn))
> > >   		return -EINVAL;
> > > +
> > > +	if (!pfn_valid(pfn))
> > > +		return -EINVAL;
> > > +
> > 
> > Merge the two errors in one "if" to have less LOC?
> > i.e.
> > if (is_error_noslot_pfn(pfn) || !pfn_valid(pfn))
> >      return -EINVAL;
> 
> you can just replace "if (is_error_noslot_pfn(pfn))" with "if
> (!pfn_valid(pfn))", it covers both cases.

Technically, yes, but the two checks are for very different things.  Practically
speaking, there can never be false negatives without KVM breaking horribly as
overlap between struct page pfns and KVM's error/noslot would prevent mapping
legal memory into a KVM guest.  But I'd rather not hide the "did KVM find a valid
mapping" in the "is this pfn backed by struct page" check, especially since this
code goes away entirely by the end of the series.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2023-03-15 15:13     ` Sean Christopherson
@ 2023-03-16  9:19       ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-16  9:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023 at 08:13:37AM -0700, Sean Christopherson wrote:
> > A curious question:
> > are arch/x86/include/asm/kvm_*.h all expected to be external accessible?
> 
> Depends on what you mean by "expected".  Currently, yes, everything in there is
> globally visible.  But the vast majority of structs, defines, functions, etc. aren't
> intended for external non-KVM consumption, things ended up being globally visible
> largely through carelessness and/or a lack of a forcing function.
> 
> E.g. there is absolutely no reason anything outside of KVM should need
> arch/x86/include/asm/kvm-x86-ops.h, but it landed in asm/ because, at the time it
> was added, nothing would be harmed by making kvm-x86-ops.h "public" and we didn't
> scrutinize the patches well enough.
> 
> My primary motivation for this series is to (eventually) get to a state where only
> select symbols/defines/etc. are exposed by KVM to the outside world, and everything
> else is internal only.  The end goal of tightly restricting KVM's global API is to
> allow concurrently loading multiple instances of kvm.ko so that userspace can
> upgrade/rollback KVM without needed to move VMs off the host, i.e. by performing
> intrahost migration between differenate instances of KVM on the same host.  To do
> that safely, anything that is visible outside of KVM needs to be compatible across
> different instances of KVM, e.g. if kvm_vcpu is "public" then a KVM upgrade/rollback
> wouldn't be able to touch "struct kvm_vcpu" in any way.  We'll definitely want to be
> able to modify things like the vCPU structures, thus the push to restrict the API.
> 
> But even if we never realize that end goal, IMO drastically reducing KVM's "public"
> API surface is worthy goal in and of itself.
Got it. Thanks for explanation!

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-03-15 15:43     ` Sean Christopherson
@ 2023-03-16  9:27       ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-16  9:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Wed, Mar 15, 2023 at 08:43:54AM -0700, Sean Christopherson wrote:
> > So, in theory, the new GFNs are not write tracked though the old ones are.
> > 
> > Is that acceptable for the internal page-track user?
> 
> It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that
Oh, yes!
And KVM will not shadow SPTEs for a invalid memslot, so there's no
problem.
Thanks~

> KVM loses the write-tracking counts is benign.  I suspect no VMM actually does
> does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance
> cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it
> out isn't worth the pain of having to deal with potential ABI breakage.
> 
> Though in hindsight I wish I had tried disallowed moving memslots instead of
> fixing the various bugs a few years back. :-(

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-15 16:21     ` Sean Christopherson
@ 2023-03-16  9:29       ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-16  9:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023 at 09:21:34AM -0700, Sean Christopherson wrote:
> On Wed, Mar 15, 2023, Yan Zhao wrote:
> > Nit: there is a typo in the commit header: "iff" -> "if"
> > 
> > > -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> > > -			  int bytes)
> > > +void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes)
> > Line length is 81 characters. A little longer than 80 :)
> > 
> > > +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
> > This line is also too long.
> 
> The 80 character limit is a "soft" limit these days, e.g. checkpatch only complains
> if a line is 100+.  In KVM x86, the preferred style is to treat the 80 char limit
> as "firm", for lack of a better word.  E.g. let a line run over if it's just a
> char or two and there's no other wrapping in the declaration, but don't create long
> lines just because checkpatch no longer yells.
> 
Got it. It's helpful to me!

> There's obviously a fair bit of subjectivity, but the guideline has worked well
> so far (hopefully I didn't just jinx us).



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
  2023-03-13 15:37   ` Wang, Wei W
@ 2023-03-17  4:20   ` Yan Zhao
  1 sibling, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  4:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:32PM -0800, Sean Christopherson wrote:
> Check that the pfn found by gfn_to_pfn() is actually backed by "struct
> page" memory prior to retrieving and dereferencing the page.  KVM
> supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
> there is no guarantee the pfn returned by gfn_to_pfn() has an associated
> "struct page".
> 
> Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 4ec85308379a..58b9b316ae46 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
>  	if (is_error_noslot_pfn(pfn))
>  		return -EINVAL;
> +
> +	if (!pfn_valid(pfn))
> +		return -EINVAL;
> +
>  	return PageTransHuge(pfn_to_page(pfn));
>  }
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
  2023-03-11  0:22 ` [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn Sean Christopherson
@ 2023-03-17  4:26   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  4:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Tested-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:34PM -0800, Sean Christopherson wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Currently intel_gvt_is_valid_gfn() is called in two places:
> (1) shadowing guest GGTT entry
> (2) shadowing guest PPGTT leaf entry,
> which was introduced in commit cc753fbe1ac4
> ("drm/i915/gvt: validate gfn before set shadow page entry").
> 
> However, now it's not necessary to call this interface any more, because
> a. GGTT partial write issue has been fixed by
>    commit bc0686ff5fad
>    ("drm/i915/gvt: support inconsecutive partial gtt entry write")
>    commit 510fe10b6180
>    ("drm/i915/gvt: fix a bug of partially write ggtt enties")
> b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
>    as valid page table writes. Any invalid GPA found is regarded as
>    an error, either due to guest misbehavior/attack or bug in host
>    shadow code.
>    So,rather than do GFN pre-checking and replace invalid GFNs with
>    scratch GFN and continue silently, just remove the pre-checking and
>    abort PPGTT shadowing on error detected.
> c. GFN validity check is still performed in
>    intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
>    It's more desirable to call VFIO interface to do both validity check
>    and mapping.
>    Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
>    while later mapping the GFN through VFIO interface is unnecessarily
>    fragile and confusing for unaware readers.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> [sean: remove now-unused local variables]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c | 36 +---------------------------------
>  1 file changed, 1 insertion(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 58b9b316ae46..f30922c55a0c 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -49,22 +49,6 @@
>  static bool enable_out_of_sync = false;
>  static int preallocated_oos_pages = 8192;
>  
> -static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
> -{
> -	struct kvm *kvm = vgpu->vfio_device.kvm;
> -	int idx;
> -	bool ret;
> -
> -	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
> -		return false;
> -
> -	idx = srcu_read_lock(&kvm->srcu);
> -	ret = kvm_is_visible_gfn(kvm, gfn);
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return ret;
> -}
> -
>  /*
>   * validate a gm address and related range size,
>   * translate it to host gm address
> @@ -1333,11 +1317,9 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
>  static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>  {
>  	struct intel_vgpu *vgpu = spt->vgpu;
> -	struct intel_gvt *gvt = vgpu->gvt;
> -	const struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
>  	struct intel_vgpu_ppgtt_spt *s;
>  	struct intel_gvt_gtt_entry se, ge;
> -	unsigned long gfn, i;
> +	unsigned long i;
>  	int ret;
>  
>  	trace_spt_change(spt->vgpu->id, "born", spt,
> @@ -1354,13 +1336,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
>  			ppgtt_generate_shadow_entry(&se, s, &ge);
>  			ppgtt_set_shadow_entry(spt, &se, i);
>  		} else {
> -			gfn = ops->get_pfn(&ge);
> -			if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -				ops->set_pfn(&se, gvt->gtt.scratch_mfn);
> -				ppgtt_set_shadow_entry(spt, &se, i);
> -				continue;
> -			}
> -
>  			ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
>  			if (ret)
>  				goto fail;
> @@ -2335,14 +2310,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>  		m.val64 = e.val64;
>  		m.type = e.type;
>  
> -		/* one PTE update may be issued in multiple writes and the
> -		 * first write may not construct a valid gfn
> -		 */
> -		if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
> -			ops->set_pfn(&m, gvt->gtt.scratch_mfn);
> -			goto out;
> -		}
> -
>  		ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
>  						   &dma_addr);
>  		if (ret) {
> @@ -2359,7 +2326,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
>  		ops->clear_present(&m);
>  	}
>  
> -out:
>  	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
>  
>  	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-03-11  0:22 ` [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
@ 2023-03-17  5:33   ` Yan Zhao
  2023-05-04 20:41     ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  5:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote:
> When shadowing a GTT entry with a 2M page, explicitly verify that the
> first page pinned by VFIO is a transparent hugepage instead of assuming
> that page observed by is_2MB_gtt_possible() is the same page pinned by
> vfio_pin_pages().  E.g. if userspace is doing something funky with the
> guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
> and vfio_pin_pages().
> 
> This is more of a performance optimization than a bug fix as the check
> for contiguous struct pages should guard against incorrect mapping (even
> though assuming struct pages are virtually contiguous is wrong).
> 
> The real motivation for explicitly checking for a transparent hugepage
> after pinning is that it will reduce the risk of introducing a bug in a
> future fix for a page refcount leak (KVMGT doesn't put the reference
> acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
> KVM's gfn_to_pfn() altogether.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8ae7039b3683..90997cc385b4 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  			goto err;
>  		}
>  
> -		if (npage == 0)
> -			base_page = cur_page;
> +		if (npage == 0) {
> +			/*
> +			 * Bail immediately to avoid unnecessary pinning when
> +			 * trying to shadow a 2M page and the host page isn't
> +			 * a transparent hugepage.
> +			 *
> +			 * TODO: support other type hugepages, e.g. HugeTLB.
> +			 */
> +			if (size == I915_GTT_PAGE_SIZE_2M &&
> +			    !PageTransHuge(cur_page))
Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
If a page is not transparent huge, but there are 512 contigous 4K
pages, I think it's still good to map them in IOMMU in 2M.
See vfio_pin_map_dma() who does similar things.

> +				ret = -EIO;
> +			else
> +				base_page = cur_page;
> +		}
>  		else if (base_page + npage != cur_page) {
>  			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
> +		}
> +		if (ret < 0) {
>  			npage++;
>  			goto err;
>  		}
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
  2023-03-11  0:22 ` [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
@ 2023-03-17  5:37   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  5:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:38PM -0800, Sean Christopherson wrote:
>  /*
> - * Check if can do 2M page
> + * Try to map a 2M gtt entry.
>   * @vgpu: target vgpu
>   * @entry: target pfn's gtt entry
>   *
> - * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
> - * negative if found err.
> + * Return 1 if 2MB huge gtt shadow was creation, 0 if the entry needs to be
> + * split, negative if found err.
>   */
> -static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
> -	struct intel_gvt_gtt_entry *entry)
> +static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu,
> +	struct intel_gvt_gtt_entry *entry, dma_addr_t *dma_addr)
>  {
>  	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
>  	unsigned long gfn = ops->get_pfn(entry);
> -	kvm_pfn_t pfn;
>  	int max_level;
> -	int ret;
>  
>  	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>  		return 0;
> @@ -1173,16 +1171,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	if (max_level < PG_LEVEL_2M)
>  		return 0;
>  
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
> -	if (is_error_noslot_pfn(pfn))
> -		return -EINVAL;
> -
> -	if (!pfn_valid(pfn))
> -		return -EINVAL;
> -
> -	ret = PageTransHuge(pfn_to_page(pfn));
> -	kvm_release_pfn_clean(pfn);
> -	return ret;
> +	return intel_gvt_dma_map_guest_page(vgpu, gfn, I915_GTT_PAGE_SIZE_2M, dma_addr);
intel_gvt_dma_map_guest_page() returns 0 on success, which is not in
consistent with the expected return value of this function, i.e.
"
Return 1 if 2MB huge gtt shadow was creation, 0 if the entry needs to be
split, negative if found err.
"

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  2023-03-11  0:22 ` [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
@ 2023-03-17  6:18   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  6:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:37PM -0800, Sean Christopherson wrote:
> Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that
> the caller is ultimately responsible for dropping any reference.
> 
> Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted
> struct page before trying to put any references.
> 
> Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index d59c7ab9d224..15848b041a0d 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1160,6 +1160,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	unsigned long gfn = ops->get_pfn(entry);
>  	kvm_pfn_t pfn;
>  	int max_level;
> +	int ret;
>  
>  	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>  		return 0;
> @@ -1179,7 +1180,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	if (!pfn_valid(pfn))
>  		return -EINVAL;
>  
> -	return PageTransHuge(pfn_to_page(pfn));
> +	ret = PageTransHuge(pfn_to_page(pfn));
> +	kvm_release_pfn_clean(pfn);
> +	return ret;
>  }
>  
>  static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  2023-03-11  0:22 ` [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
@ 2023-03-17  6:19   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  6:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:39PM -0800, Sean Christopherson wrote:
> Use an "unsigned long" instead of an "int" when iterating over the gfns
> in a memslot.  The number of pages in the memslot is tracked as an
> "unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
> larger than 16TiB were deleted (2^32 * 4KiB).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 90997cc385b4..68be66395598 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1634,7 +1634,7 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot,
>  		struct kvm_page_track_notifier_node *node)
>  {
> -	int i;
> +	unsigned long i;
>  	gfn_t gfn;
>  	struct intel_vgpu *info =
>  		container_of(node, struct intel_vgpu, track_node);
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
  2023-03-11  0:22 ` [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt() Sean Christopherson
@ 2023-03-17  6:20   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  6:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:40PM -0800, Sean Christopherson wrote:
> Drop intel_vgpu_reset_gtt() as it no longer has any callers.  In addition
> to eliminating dead code, this eliminates the last possible scenario where
> __kvmgt_protect_table_find() can be reached without holding vgpu_lock.
> Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find()
> will allow a protecting the gfn hash with vgpu_lock without too much fuss.
> 
> No functional change intended.
> 
> Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------
>  drivers/gpu/drm/i915/gvt/gtt.h |  1 -
>  2 files changed, 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index e60bcce241f8..293bb2292021 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -2845,24 +2845,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old)
>  	ggtt_invalidate(gvt->gt);
>  }
>  
> -/**
> - * intel_vgpu_reset_gtt - reset the all GTT related status
> - * @vgpu: a vGPU
> - *
> - * This function is called from vfio core to reset reset all
> - * GTT related status, including GGTT, PPGTT, scratch page.
> - *
> - */
> -void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu)
> -{
> -	/* Shadow pages are only created when there is no page
> -	 * table tracking data, so remove page tracking data after
> -	 * removing the shadow pages.
> -	 */
> -	intel_vgpu_destroy_all_ppgtt_mm(vgpu);
> -	intel_vgpu_reset_ggtt(vgpu, true);
> -}
> -
>  /**
>   * intel_gvt_restore_ggtt - restore all vGPU's ggtt entries
>   * @gvt: intel gvt device
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
> index a3b0f59ec8bd..4cb183e06e95 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.h
> +++ b/drivers/gpu/drm/i915/gvt/gtt.h
> @@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old);
>  void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);
>  
>  int intel_gvt_init_gtt(struct intel_gvt *gvt);
> -void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu);
>  void intel_gvt_clean_gtt(struct intel_gvt *gvt);
>  
>  struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock
  2023-03-11  0:22 ` [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock Sean Christopherson
@ 2023-03-17  6:21   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  6:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:41PM -0800, Sean Christopherson wrote:
> Use vgpu_lock instead of KVM's mmu_lock to protect accesses to the hash
> table used to track which gfns are write-protected when shadowing the
> guest's GTT, and hoist the acquisition of vgpu_lock from
> intel_vgpu_page_track_handler() out to its sole caller,
> kvmgt_page_track_write().
> 
> This fixes a bug where kvmgt_page_track_write(), which doesn't hold
> kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
> a use-after-free.
> 
> Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
> as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
> sleep when acquiring vgpu->cache_lock deep down the callstack:
> 
>   intel_vgpu_page_track_handler()
>   |
>   |->  page_track->handler / ppgtt_write_protection_handler()
>        |
>        |-> ppgtt_handle_guest_write_page_table_bytes()
>            |
>            |->  ppgtt_handle_guest_write_page_table()
>                 |
>                 |-> ppgtt_handle_guest_entry_removal()
>                     |
>                     |-> ppgtt_invalidate_pte()
>                         |
>                         |-> intel_gvt_dma_unmap_guest_page()
>                             |
>                             |-> mutex_lock(&vgpu->cache_lock);
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c      | 55 +++++++++++++++------------
>  drivers/gpu/drm/i915/gvt/page_track.c | 10 +----
>  2 files changed, 33 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 68be66395598..9824d075562e 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -366,6 +366,8 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
>  {
>  	struct kvmgt_pgfn *p, *res = NULL;
>  
> +	lockdep_assert_held(&info->vgpu_lock);
> +
>  	hash_for_each_possible(info->ptable, p, hnode, gfn) {
>  		if (gfn == p->gfn) {
>  			res = p;
> @@ -1567,6 +1569,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
>  	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
>  		return -ESRCH;
>  
> +	if (kvmgt_gfn_is_write_protected(info, gfn))
> +		return 0;
> +
>  	idx = srcu_read_lock(&kvm->srcu);
>  	slot = gfn_to_memslot(kvm, gfn);
>  	if (!slot) {
> @@ -1575,16 +1580,12 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
>  	}
>  
>  	write_lock(&kvm->mmu_lock);
> -
> -	if (kvmgt_gfn_is_write_protected(info, gfn))
> -		goto out;
> -
>  	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
> +	write_unlock(&kvm->mmu_lock);
> +
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
>  	kvmgt_protect_table_add(info, gfn);
> -
> -out:
> -	write_unlock(&kvm->mmu_lock);
> -	srcu_read_unlock(&kvm->srcu, idx);
>  	return 0;
>  }
>  
> @@ -1597,24 +1598,22 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
>  	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
>  		return -ESRCH;
>  
> -	idx = srcu_read_lock(&kvm->srcu);
> -	slot = gfn_to_memslot(kvm, gfn);
> -	if (!slot) {
> -		srcu_read_unlock(&kvm->srcu, idx);
> -		return -EINVAL;
> -	}
> -
> -	write_lock(&kvm->mmu_lock);
> -
>  	if (!kvmgt_gfn_is_write_protected(info, gfn))
> -		goto out;
> +		return 0;
>  
> +	idx = srcu_read_lock(&kvm->srcu);
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		srcu_read_unlock(&kvm->srcu, idx);
> +		return -EINVAL;
> +	}
> +
> +	write_lock(&kvm->mmu_lock);
>  	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
> +	write_unlock(&kvm->mmu_lock);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
>  	kvmgt_protect_table_del(info, gfn);
> -
> -out:
> -	write_unlock(&kvm->mmu_lock);
> -	srcu_read_unlock(&kvm->srcu, idx);
>  	return 0;
>  }
>  
> @@ -1625,9 +1624,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	struct intel_vgpu *info =
>  		container_of(node, struct intel_vgpu, track_node);
>  
> +	mutex_lock(&info->vgpu_lock);
> +
>  	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
>  		intel_vgpu_page_track_handler(info, gpa,
>  						     (void *)val, len);
> +
> +	mutex_unlock(&info->vgpu_lock);
>  }
>  
>  static void kvmgt_page_track_flush_slot(struct kvm *kvm,
> @@ -1639,16 +1642,20 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
>  	struct intel_vgpu *info =
>  		container_of(node, struct intel_vgpu, track_node);
>  
> -	write_lock(&kvm->mmu_lock);
> +	mutex_lock(&info->vgpu_lock);
> +
>  	for (i = 0; i < slot->npages; i++) {
>  		gfn = slot->base_gfn + i;
>  		if (kvmgt_gfn_is_write_protected(info, gfn)) {
> +			write_lock(&kvm->mmu_lock);
>  			kvm_slot_page_track_remove_page(kvm, slot, gfn,
>  						KVM_PAGE_TRACK_WRITE);
> +			write_unlock(&kvm->mmu_lock);
> +
>  			kvmgt_protect_table_del(info, gfn);
>  		}
>  	}
> -	write_unlock(&kvm->mmu_lock);
> +	mutex_unlock(&info->vgpu_lock);
>  }
>  
>  void intel_vgpu_detach_regions(struct intel_vgpu *vgpu)
> diff --git a/drivers/gpu/drm/i915/gvt/page_track.c b/drivers/gpu/drm/i915/gvt/page_track.c
> index df34e73cba41..60a65435556d 100644
> --- a/drivers/gpu/drm/i915/gvt/page_track.c
> +++ b/drivers/gpu/drm/i915/gvt/page_track.c
> @@ -162,13 +162,9 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
>  	struct intel_vgpu_page_track *page_track;
>  	int ret = 0;
>  
> -	mutex_lock(&vgpu->vgpu_lock);
> -
>  	page_track = intel_vgpu_find_page_track(vgpu, gpa >> PAGE_SHIFT);
> -	if (!page_track) {
> -		ret = -ENXIO;
> -		goto out;
> -	}
> +	if (!page_track)
> +		return -ENXIO;
>  
>  	if (unlikely(vgpu->failsafe)) {
>  		/* Remove write protection to prevent furture traps. */
> @@ -179,7 +175,5 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
>  			gvt_err("guest page write error, gpa %llx\n", gpa);
>  	}
>  
> -out:
> -	mutex_unlock(&vgpu->vgpu_lock);
>  	return ret;
>  }
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  2023-03-11  0:22 ` [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
@ 2023-03-17  6:37   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  6:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:43PM -0800, Sean Christopherson wrote:
> Don't use the generic page-track mechanism to handle writes to guest PTEs
> in KVM's MMU.  KVM's MMU needs access to information that should not be
> exposed to external page-track users, e.g. KVM needs (for some definitions
> of "need") the vCPU to query the current paging mode, whereas external
> users, i.e. KVMGT, have no ties to the current vCPU and so should never
> need the vCPU.
> 
> Moving away from the page-track mechanism will allow dropping use of the
> page-track mechanism for KVM's own MMU, and will also allow simplifying
> and cleaning up the page-track APIs.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/mmu.h              |  2 ++
>  arch/x86/kvm/mmu/mmu.c          | 13 ++-----------
>  arch/x86/kvm/mmu/page_track.c   |  2 ++
>  4 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 17281d6825c9..1a4225237564 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1265,7 +1265,6 @@ struct kvm_arch {
>  	 * create an NX huge page (without hanging the guest).
>  	 */
>  	struct list_head possible_nx_huge_pages;
> -	struct kvm_page_track_notifier_node mmu_sp_tracker;
>  	struct kvm_page_track_notifier_head track_notifier_head;
>  	/*
>  	 * Protects marking pages unsync during page faults, as TDP MMU page
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 168c46fd8dd1..b8bde42f6037 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -119,6 +119,8 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu);
>  void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu);
>  void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
>  void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
> +void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			 int bytes);
>  
>  static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
>  {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 409dabec69df..4f2f83d8322e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5603,9 +5603,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>  	return spte;
>  }
>  
> -static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> -			      const u8 *new, int bytes,
> -			      struct kvm_page_track_notifier_node *node)
> +void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			 int bytes)
>  {
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	struct kvm_mmu_page *sp;
> @@ -6088,7 +6087,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
>  
>  int kvm_mmu_init_vm(struct kvm *kvm)
>  {
> -	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
>  	int r;
>  
>  	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
> @@ -6102,9 +6100,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
>  			return r;
>  	}
>  
> -	node->track_write = kvm_mmu_pte_write;
> -	kvm_page_track_register_notifier(kvm, node);
> -
>  	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
>  	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
>  
> @@ -6125,10 +6120,6 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
>  
>  void kvm_mmu_uninit_vm(struct kvm *kvm)
>  {
> -	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> -
> -	kvm_page_track_unregister_notifier(kvm, node);
> -
>  	if (tdp_mmu_enabled)
>  		kvm_mmu_uninit_tdp_mmu(kvm);
>  
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index e739dcc3375c..f39f190ad4ae 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -274,6 +274,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  		if (n->track_write)
>  			n->track_write(vcpu, gpa, new, bytes, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
> +
> +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
>  }
>  
>  /*
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  2023-03-11  0:22 ` [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
@ 2023-03-17  7:28   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:44PM -0800, Sean Christopherson wrote:
> Drop @vcpu from KVM's ->track_write() hook provided for external users of
> the page-track APIs now that KVM itself doesn't use the page-track
> mechanism.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  5 ++---
>  arch/x86/kvm/mmu/page_track.c         |  2 +-
>  drivers/gpu/drm/i915/gvt/kvmgt.c      | 10 ++++------
>  3 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 3f72c7a172fc..0d65ae203fd6 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
>  	 * It is called when guest is writing the write-tracked page
>  	 * and write emulation is finished at that time.
>  	 *
> -	 * @vcpu: the vcpu where the write access happened.
>  	 * @gpa: the physical address written by guest.
>  	 * @new: the data was written to the address.
>  	 * @bytes: the written length.
>  	 * @node: this node
>  	 */
> -	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			    int bytes, struct kvm_page_track_notifier_node *node);
> +	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
> +			    struct kvm_page_track_notifier_node *node);
>  	/*
>  	 * It is called when memory slot is being moved or removed
>  	 * users can drop write-protection for the pages in that memory slot
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index f39f190ad4ae..39a0863af8b4 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -272,7 +272,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
>  				srcu_read_lock_held(&head->track_srcu))
>  		if (n->track_write)
> -			n->track_write(vcpu, gpa, new, bytes, n);
> +			n->track_write(gpa, new, bytes, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  
>  	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 9824d075562e..292750dc819f 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -106,9 +106,8 @@ struct gvt_dma {
>  #define vfio_dev_to_vgpu(vfio_dev) \
>  	container_of((vfio_dev), struct intel_vgpu, vfio_device)
>  
> -static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> -		const u8 *val, int len,
> -		struct kvm_page_track_notifier_node *node);
> +static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
> +				   struct kvm_page_track_notifier_node *node);
>  static void kvmgt_page_track_flush_slot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot,
>  		struct kvm_page_track_notifier_node *node);
> @@ -1617,9 +1616,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
>  	return 0;
>  }
>  
> -static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> -		const u8 *val, int len,
> -		struct kvm_page_track_notifier_node *node)
> +static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
> +				   struct kvm_page_track_notifier_node *node)
>  {
>  	struct intel_vgpu *info =
>  		container_of(node, struct intel_vgpu, track_node);
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2023-03-11  0:22 ` [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
  2023-03-15  8:03   ` Yan Zhao
@ 2023-03-17  7:29   ` Yan Zhao
  1 sibling, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote:
> Disallow moving memslots if the VM has external page-track users, i.e. if
> KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
> correctly handle moving memory regions.
> 
> Note, this is potential ABI breakage!  E.g. userspace could move regions
> that aren't shadowed by KVMGT without harming the guest.  However, the
> only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
> regions.  KVM's own support for moving memory regions was also broken for
> multiple years (albeit for an edge case, but arguably moving RAM is
> itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
> new rmap and large page tracking when moving memslot").
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h | 3 +++
>  arch/x86/kvm/mmu/page_track.c         | 5 +++++
>  arch/x86/kvm/x86.c                    | 7 +++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 0d65ae203fd6..6a287bcbe8a9 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
>  void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  			  int bytes);
>  void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
> +
> +bool kvm_page_track_has_external_user(struct kvm *kvm);
> +
>  #endif
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 39a0863af8b4..1cfc0a0ccc23 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -321,3 +321,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  	return max_level;
>  }
>  EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> +
> +bool kvm_page_track_has_external_user(struct kvm *kvm)
> +{
> +	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 29dd6c97d145..47ac9291cd43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  				   struct kvm_memory_slot *new,
>  				   enum kvm_mr_change change)
>  {
> +	/*
> +	 * KVM doesn't support moving memslots when there are external page
> +	 * trackers attached to the VM, i.e. if KVMGT is in use.
> +	 */
> +	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
> +		return -EINVAL;
> +
>  	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
>  		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
>  			return -EINVAL;
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  2023-03-11  0:22 ` [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
@ 2023-03-17  7:30   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:46PM -0800, Sean Christopherson wrote:
> When handling a slot "flush", don't call back into KVM to drop write
> protection for gfns in the slot.  Now that KVM rejects attempts to move
> memory slots while KVMGT is attached, the only time a slot is "flushed"
> is when it's being removed, i.e. the memslot and all its write-tracking
> metadata is about to be deleted.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 292750dc819f..577712ea4893 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1644,14 +1644,8 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
>  
>  	for (i = 0; i < slot->npages; i++) {
>  		gfn = slot->base_gfn + i;
> -		if (kvmgt_gfn_is_write_protected(info, gfn)) {
> -			write_lock(&kvm->mmu_lock);
> -			kvm_slot_page_track_remove_page(kvm, slot, gfn,
> -						KVM_PAGE_TRACK_WRITE);
> -			write_unlock(&kvm->mmu_lock);
> -
> +		if (kvmgt_gfn_is_write_protected(info, gfn))
>  			kvmgt_protect_table_del(info, gfn);
> -		}
>  	}
>  	mutex_unlock(&info->vgpu_lock);
>  }
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion
  2023-03-11  0:22 ` [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
@ 2023-03-17  7:43   ` Yan Zhao
  2023-03-17 16:20     ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 10, 2023 at 04:22:47PM -0800, Sean Christopherson wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Add a new page-track hook, track_remove_region(), that is called when a
> memslot DELETE operation is about to be committed.  The "remove" hook
> will be used by KVMGT and will effectively replace the existing
> track_flush_slot() altogether now that KVM itself doesn't rely on the
> "flush" hook either.
> 
> The "flush" hook is flawed as it's invoked before the memslot operation
> is guaranteed to succeed, i.e. KVM might ultimately keep the existing
> memslot without notifying external page track users, a.k.a. KVMGT.  In
> practice, this can't currently happen on x86, but there are no guarantees
> that won't change in the future, not to mention that "flush" does a very
> poor job of describing what is happening.
> 
> Pass in the gfn+nr_pages instead of the slot itself so external users,
> i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
> help set the stage for additional cleanups to the page-track APIs.
> 
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
...

> +void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +	struct kvm_page_track_notifier_node *n;
> +	int idx;
> +
> +	head = &kvm->arch.track_notifier_head;
> +
> +	if (hlist_empty(&head->track_notifier_list))
> +		return;
> +
> +	idx = srcu_read_lock(&head->track_srcu);
> +	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
> +				srcu_read_lock_held(&head->track_srcu))
Sorry, not sure why the alignment here is not right.
Patchwork just sent me a mail to complain about it.
Would you mind helping fix it in the next version?

Thanks a lot!

> +		if (n->track_remove_region)
> +			n->track_remove_region(slot->base_gfn, slot->npages, n);
> +	srcu_read_unlock(&head->track_srcu, idx);
> +}
> +

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  2023-03-11  0:22 ` [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
@ 2023-03-17  7:45   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Tested-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:48PM -0800, Sean Christopherson wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Switch from the poorly named and flawed ->track_flush_slot() to the newly
> introduced ->track_remove_region().  From KVMGT's perspective, the two
> hooks are functionally equivalent, the only difference being that
> ->track_remove_region() is called only when KVM is 100% certain the
> memory region will be removed, i.e. is invoked slightly later in KVM's
> memslot modification flow.
> 
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> [sean: handle name change, massage changelog, rebase]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 577712ea4893..9f188b6c3edf 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -108,9 +108,8 @@ struct gvt_dma {
>  
>  static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
>  				   struct kvm_page_track_notifier_node *node);
> -static void kvmgt_page_track_flush_slot(struct kvm *kvm,
> -		struct kvm_memory_slot *slot,
> -		struct kvm_page_track_notifier_node *node);
> +static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
> +					   struct kvm_page_track_notifier_node *node);
>  
>  static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
>  {
> @@ -680,7 +679,7 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
>  		return -EEXIST;
>  
>  	vgpu->track_node.track_write = kvmgt_page_track_write;
> -	vgpu->track_node.track_flush_slot = kvmgt_page_track_flush_slot;
> +	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
>  	kvm_get_kvm(vgpu->vfio_device.kvm);
>  	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
>  					 &vgpu->track_node);
> @@ -1631,22 +1630,20 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
>  	mutex_unlock(&info->vgpu_lock);
>  }
>  
> -static void kvmgt_page_track_flush_slot(struct kvm *kvm,
> -		struct kvm_memory_slot *slot,
> -		struct kvm_page_track_notifier_node *node)
> +static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
> +					   struct kvm_page_track_notifier_node *node)
>  {
>  	unsigned long i;
> -	gfn_t gfn;
>  	struct intel_vgpu *info =
>  		container_of(node, struct intel_vgpu, track_node);
>  
>  	mutex_lock(&info->vgpu_lock);
>  
> -	for (i = 0; i < slot->npages; i++) {
> -		gfn = slot->base_gfn + i;
> -		if (kvmgt_gfn_is_write_protected(info, gfn))
> -			kvmgt_protect_table_del(info, gfn);
> +	for (i = 0; i < nr_pages; i++) {
> +		if (kvmgt_gfn_is_write_protected(info, gfn + i))
> +			kvmgt_protect_table_del(info, gfn + i);
>  	}
> +
>  	mutex_unlock(&info->vgpu_lock);
>  }
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  2023-03-11  0:22 ` [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
@ 2023-03-17  7:55   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  7:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Tested-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:54PM -0800, Sean Christopherson wrote:
> When adding/removing gfns to/from write-tracking, assert that mmu_lock
> is held for write, and that either slots_lock or kvm->srcu is held.
> mmu_lock must be held for write to protect gfn_write_track's refcount,
> and SRCU or slots_lock must be held to protect the memslot itself.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/page_track.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 1993db4578e5..ffcd7ac66f9e 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -12,6 +12,7 @@
>   */
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/lockdep.h>
>  #include <linux/kvm_host.h>
>  #include <linux/rculist.h>
>  
> @@ -77,9 +78,6 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
>   * add guest page to the tracking pool so that corresponding access on that
>   * page will be intercepted.
>   *
> - * It should be called under the protection both of mmu-lock and kvm->srcu
> - * or kvm->slots_lock.
> - *
>   * @kvm: the guest instance we are interested in.
>   * @slot: the @gfn belongs to.
>   * @gfn: the guest page.
> @@ -87,6 +85,11 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
>  void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  			     gfn_t gfn)
>  {
> +	lockdep_assert_held_write(&kvm->mmu_lock);
> +
> +	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
> +			    srcu_read_lock_held(&kvm->srcu));
> +
>  	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
>  		return;
>  
> @@ -107,9 +110,6 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
>   * remove the guest page from the tracking pool which stops the interception
>   * of corresponding access on that page.
>   *
> - * It should be called under the protection both of mmu-lock and kvm->srcu
> - * or kvm->slots_lock.
> - *
>   * @kvm: the guest instance we are interested in.
>   * @slot: the @gfn belongs to.
>   * @gfn: the guest page.
> @@ -117,6 +117,11 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
>  void kvm_write_track_remove_gfn(struct kvm *kvm,
>  				struct kvm_memory_slot *slot, gfn_t gfn)
>  {
> +	lockdep_assert_held_write(&kvm->mmu_lock);
> +
> +	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
> +			    srcu_read_lock_held(&kvm->srcu));
> +
>  	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
>  		return;
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-03-11  0:22 ` [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
@ 2023-03-17  8:28   ` Yan Zhao
  2023-03-23  8:50     ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  8:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gfx,
	linux-kernel, Ben Gardon, intel-gvt-dev

On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
...
> +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +	struct kvm_memory_slot *slot;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		srcu_read_unlock(&kvm->srcu, idx);
> +		return -EINVAL;
> +	}
> +
Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
There should exist a window for external users to see an invalid slot
when a slot is about to get deleted/moved.
(It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).

> +	write_lock(&kvm->mmu_lock);
> +	__kvm_write_track_add_gfn(kvm, slot, gfn);
> +	write_unlock(&kvm->mmu_lock);
> +
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
> +
> +/*
> + * remove the guest page from the tracking pool which stops the interception
> + * of corresponding access on that page.
> + *
> + * @kvm: the guest instance we are interested in.
> + * @gfn: the guest page.
> + */
> +int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +	struct kvm_memory_slot *slot;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot) {
> +		srcu_read_unlock(&kvm->srcu, idx);
> +		return -EINVAL;
> +	}
> +
Ditto.

> +	write_lock(&kvm->mmu_lock);
> +	__kvm_write_track_remove_gfn(kvm, slot, gfn);
> +	write_unlock(&kvm->mmu_lock);
> +
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  2023-03-11  0:22 ` [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
@ 2023-03-17  8:52   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  8:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:57PM -0800, Sean Christopherson wrote:
> Get/put references to KVM when a page-track notifier is (un)registered
> instead of relying on the caller to do so.  Forcing the caller to do the
> bookkeeping is unnecessary and adds one more thing for users to get
> wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
> iff attachment to VM is successful").
Just realized that "iff" stands for "if and only if" :) 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details
  2023-03-11  0:22 ` [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
@ 2023-03-17  8:58   ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-03-17  8:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>

On Fri, Mar 10, 2023 at 04:22:58PM -0800, Sean Christopherson wrote:
> Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
> dependency on kvm_host.h, i.e. include only kvm_page_track.h.  KVMGT
> assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
> lines below in the same function with the same gpa, i.e. there's no
> reason to use KVM's helper for this one case.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/gvt.h   | 3 ++-
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
> index 2d65800d8e93..53a0a42a50db 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.h
> +++ b/drivers/gpu/drm/i915/gvt/gvt.h
> @@ -34,10 +34,11 @@
>  #define _GVT_H_
>  
>  #include <uapi/linux/pci_regs.h>
> -#include <linux/kvm_host.h>
>  #include <linux/vfio.h>
>  #include <linux/mdev.h>
>  
> +#include <asm/kvm_page_track.h>
> +
>  #include "i915_drv.h"
>  #include "intel_gvt.h"
>  
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index d16aced134b4..798d04481f03 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1599,7 +1599,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
>  
>  	mutex_lock(&info->vgpu_lock);
>  
> -	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
> +	if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
>  		intel_vgpu_page_track_handler(info, gpa,
>  						     (void *)val, len);
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion
  2023-03-17  7:43   ` Yan Zhao
@ 2023-03-17 16:20     ` Sean Christopherson
  0 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-03-17 16:20 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 17, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:47PM -0800, Sean Christopherson wrote:
> > From: Yan Zhao <yan.y.zhao@intel.com>
> > 
> > Add a new page-track hook, track_remove_region(), that is called when a
> > memslot DELETE operation is about to be committed.  The "remove" hook
> > will be used by KVMGT and will effectively replace the existing
> > track_flush_slot() altogether now that KVM itself doesn't rely on the
> > "flush" hook either.
> > 
> > The "flush" hook is flawed as it's invoked before the memslot operation
> > is guaranteed to succeed, i.e. KVM might ultimately keep the existing
> > memslot without notifying external page track users, a.k.a. KVMGT.  In
> > practice, this can't currently happen on x86, but there are no guarantees
> > that won't change in the future, not to mention that "flush" does a very
> > poor job of describing what is happening.
> > 
> > Pass in the gfn+nr_pages instead of the slot itself so external users,
> > i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
> > help set the stage for additional cleanups to the page-track APIs.
> > 
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> ...
> 
> > +void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > +{
> > +	struct kvm_page_track_notifier_head *head;
> > +	struct kvm_page_track_notifier_node *n;
> > +	int idx;
> > +
> > +	head = &kvm->arch.track_notifier_head;
> > +
> > +	if (hlist_empty(&head->track_notifier_list))
> > +		return;
> > +
> > +	idx = srcu_read_lock(&head->track_srcu);
> > +	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
> > +				srcu_read_lock_held(&head->track_srcu))
> Sorry, not sure why the alignment here is not right.
> Patchwork just sent me a mail to complain about it.
> Would you mind helping fix it in the next version?

Ah, it's off by two spaces, should be 

	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
				  srcu_read_lock_held(&head->track_srcu))

I'll get it fixed in the next version.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-03-17  8:28   ` Yan Zhao
@ 2023-03-23  8:50     ` Yan Zhao
  2023-05-03 23:16       ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-03-23  8:50 UTC (permalink / raw)
  To: Sean Christopherson, kvm, intel-gfx, linux-kernel, Zhenyu Wang,
	Ben Gardon, Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
> ...
> > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> > +{
> > +	struct kvm_memory_slot *slot;
> > +	int idx;
> > +
> > +	idx = srcu_read_lock(&kvm->srcu);
> > +
> > +	slot = gfn_to_memslot(kvm, gfn);
> > +	if (!slot) {
> > +		srcu_read_unlock(&kvm->srcu, idx);
> > +		return -EINVAL;
> > +	}
> > +
> Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
> There should exist a window for external users to see an invalid slot
> when a slot is about to get deleted/moved.
> (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).

Or using
        if (!kvm_is_visible_memslot(slot)) {
		srcu_read_unlock(&kvm->srcu, idx);
		return -EINVAL;
	}

> 
> > +	write_lock(&kvm->mmu_lock);
> > +	__kvm_write_track_add_gfn(kvm, slot, gfn);
> > +	write_unlock(&kvm->mmu_lock);
> > +
> > +	srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
> > +
> > +/*
> > + * remove the guest page from the tracking pool which stops the interception
> > + * of corresponding access on that page.
> > + *
> > + * @kvm: the guest instance we are interested in.
> > + * @gfn: the guest page.
> > + */
> > +int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
> > +{
> > +	struct kvm_memory_slot *slot;
> > +	int idx;
> > +
> > +	idx = srcu_read_lock(&kvm->srcu);
> > +
> > +	slot = gfn_to_memslot(kvm, gfn);
> > +	if (!slot) {
> > +		srcu_read_unlock(&kvm->srcu, idx);
> > +		return -EINVAL;
> > +	}
> > +
> Ditto.
> 
> > +	write_lock(&kvm->mmu_lock);
> > +	__kvm_write_track_remove_gfn(kvm, slot, gfn);
> > +	write_unlock(&kvm->mmu_lock);
> > +
> > +	srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-03-23  8:50     ` Yan Zhao
@ 2023-05-03 23:16       ` Sean Christopherson
  2023-05-04  2:17         ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-05-03 23:16 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

Finally getting back to this series...

On Thu, Mar 23, 2023, Yan Zhao wrote:
> On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote:
> > On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
> > ...
> > > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> > > +{
> > > +	struct kvm_memory_slot *slot;
> > > +	int idx;
> > > +
> > > +	idx = srcu_read_lock(&kvm->srcu);
> > > +
> > > +	slot = gfn_to_memslot(kvm, gfn);
> > > +	if (!slot) {
> > > +		srcu_read_unlock(&kvm->srcu, idx);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
> > There should exist a window for external users to see an invalid slot
> > when a slot is about to get deleted/moved.
> > (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).
> 
> Or using
>         if (!kvm_is_visible_memslot(slot)) {
> 		srcu_read_unlock(&kvm->srcu, idx);
> 		return -EINVAL;
> 	}

Hrm.  If the DELETE/MOVE succeeds, then the funky accounting is ok (by the end
of the series) as the tracking disappears on DELETE, KVMGT will reject MOVE, and
KVM proper zaps SPTEs and resets accounting on MOVE (account_shadowed() runs under
mmu_lock and thus ensures all previous SPTEs are zapped before the "flush" from
kvm_arch_flush_shadow_memslot() can run).

If kvm_prepare_memory_region() fails though...

Ah, KVM itself is safe because of the aforementioned kvm_arch_flush_shadow_memslot().
Any accounting done on a temporarily invalid memslot will be unwound when the SPTEs
are zapped.  So for KVM, ignoring invalid memslots is correct _and necessary_.
We could clean that up by having accounted_shadowed() use the @slot from the fault,
which would close the window where the fault starts with a valid memslot but then
sees an invalid memslot when accounting a new shadow page.  But I don't think there
is a bug there.

Right, and DELETE can't actually fail in the current code base, and we've established
that MOVE can't possibly work.  So even if this is problematic in theory, there are
no _unknown_ bugs, and the known bugs are fixed by the end of the series.

And at the end of the series, KVMGT drops its tracking only when the DELETE is
committed.  So I _think_ allowing external trackers to add and remove gfns for
write-tracking in an invalid slot is actually desirable/correct.  I'm pretty sure
removal should be allowed as that can lead to dangling write-protection in a
rollback scenario.   And I can't think of anything that will break (in the kernel)
if write-tracking a gfn in an invalid slot is allowed, so I don't see any harm in
allowing the extremely theoretical case of KVMGT shadowing a gfn in a to-be-deleted
memslot _and_ the deletion being rolled back.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-05-03 23:16       ` Sean Christopherson
@ 2023-05-04  2:17         ` Yan Zhao
  2023-05-08  1:15           ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-05-04  2:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Wed, May 03, 2023 at 04:16:10PM -0700, Sean Christopherson wrote:
> Finally getting back to this series...
> 
> On Thu, Mar 23, 2023, Yan Zhao wrote:
> > On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote:
> > > On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
> > > ...
> > > > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> > > > +{
> > > > +	struct kvm_memory_slot *slot;
> > > > +	int idx;
> > > > +
> > > > +	idx = srcu_read_lock(&kvm->srcu);
> > > > +
> > > > +	slot = gfn_to_memslot(kvm, gfn);
> > > > +	if (!slot) {
> > > > +		srcu_read_unlock(&kvm->srcu, idx);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
> > > There should exist a window for external users to see an invalid slot
> > > when a slot is about to get deleted/moved.
> > > (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).
> > 
> > Or using
> >         if (!kvm_is_visible_memslot(slot)) {
> > 		srcu_read_unlock(&kvm->srcu, idx);
> > 		return -EINVAL;
> > 	}
> 
> Hrm.  If the DELETE/MOVE succeeds, then the funky accounting is ok (by the end
> of the series) as the tracking disappears on DELETE, KVMGT will reject MOVE, and
> KVM proper zaps SPTEs and resets accounting on MOVE (account_shadowed() runs under
> mmu_lock and thus ensures all previous SPTEs are zapped before the "flush" from
> kvm_arch_flush_shadow_memslot() can run).
> 
> If kvm_prepare_memory_region() fails though...
> 
> Ah, KVM itself is safe because of the aforementioned kvm_arch_flush_shadow_memslot().
> Any accounting done on a temporarily invalid memslot will be unwound when the SPTEs
> are zapped.  So for KVM, ignoring invalid memslots is correct _and necessary_.
> We could clean that up by having accounted_shadowed() use the @slot from the fault,
> which would close the window where the fault starts with a valid memslot but then
> sees an invalid memslot when accounting a new shadow page.  But I don't think there
> is a bug there.
> 
> Right, and DELETE can't actually fail in the current code base, and we've established
> that MOVE can't possibly work.  So even if this is problematic in theory, there are
> no _unknown_ bugs, and the known bugs are fixed by the end of the series.
> 
> And at the end of the series, KVMGT drops its tracking only when the DELETE is
> committed.  So I _think_ allowing external trackers to add and remove gfns for
> write-tracking in an invalid slot is actually desirable/correct.  I'm pretty sure
> removal should be allowed as that can lead to dangling write-protection in a
> rollback scenario.   And I can't think of anything that will break (in the kernel)
> if write-tracking a gfn in an invalid slot is allowed, so I don't see any harm in
> allowing the extremely theoretical case of KVMGT shadowing a gfn in a to-be-deleted
> memslot _and_ the deletion being rolled back.
Yes, you are right!
I previously thought that
invalid_slot->arch.gfn_write_track and old->arch.gfn_write_track are
pointing to different places. But I'm wrong.
Yes, allowing INVALID slot here is more desired for deletion rolling-back.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-03-15 16:54     ` Sean Christopherson
@ 2023-05-04 19:54       ` Sean Christopherson
  2023-05-06  1:08         ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-05-04 19:54 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Mar 15, 2023, Sean Christopherson wrote:
> On Wed, Mar 15, 2023, Yan Zhao wrote:
> > On Fri, Mar 10, 2023 at 04:22:51PM -0800, Sean Christopherson wrote:
> > > Disable the page-track notifier code at compile time if there are no
> > > external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> > > now hooks emulated writes directly instead of relying on the page-track
> > > mechanism.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/include/asm/kvm_host.h       |  2 ++
> > >  arch/x86/include/asm/kvm_page_track.h |  2 ++
> > >  arch/x86/kvm/mmu/page_track.c         |  9 ++++-----
> > >  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
> > >  4 files changed, 33 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 1a4225237564..a3423711e403 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1265,7 +1265,9 @@ struct kvm_arch {
> > >  	 * create an NX huge page (without hanging the guest).
> > >  	 */
> > >  	struct list_head possible_nx_huge_pages;
> > > +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> > >  	struct kvm_page_track_notifier_head track_notifier_head;
> > > +#endif
> > >  	/*
> > >  	 * Protects marking pages unsync during page faults, as TDP MMU page
> > >  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> > > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> > > index deece45936a5..53c2adb25a07 100644
> > > --- a/arch/x86/include/asm/kvm_page_track.h
> > > +++ b/arch/x86/include/asm/kvm_page_track.h
> > The "#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING" can be moved to the
> > front of this file?
> > All the structures are only exposed for external users now.
> 
> Huh.  I've no idea why I didn't do that.  IIRC, the entire reason past me wrapped
> track_notifier_head in an #ifdef was to allow this change in kvm_page_track.h.
> 
> I'll do this in the next version unless I discover an edge case I'm overlooking.

Ah, deja vu.  I tried this first time around, and got yelled at by the kernel test
robot.  Unsuprisingly, my second attempt yielded the same result :-)

  HDRTEST drivers/gpu/drm/i915/gvt/gvt.h
In file included from <command-line>:
gpu/drivers/gpu/drm/i915/gvt/gvt.h:236:45: error: field ‘track_node’ has incomplete type
  236 |         struct kvm_page_track_notifier_node track_node;
      |                                             ^~~~~~~~~~

The problem is direct header inclusion.  Nothing in the kernel includes gvt.h
when CONFIG_DRM_I915_GVT=n, but the header include guard tests include headers
directly on the command line.  I think I'll define a "stub" specifically to play
nice with this sort of testing.  Guarding the guts of gvt.h with CONFIG_DRM_I915_GVT
would just propagate the problem, and guarding the node definition in "struct
intel_vgpu" would be confusing since the guard would be dead code for all intents
and purposes.

The obvious alternative would be to leave kvm_page_track_notifier_node outside of
the #ifdef, but I really want to bury kvm_page_track_notifier_head for KVM's sake,
and having "head" buried but not "node" would also be weird and confusing.

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 33f087437209..3d040741044b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -51,6 +51,12 @@ void kvm_page_track_unregister_notifier(struct kvm *kvm,
 
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
+#else
+/*
+ * Allow defining a node in a structure even if page tracking is disabled, e.g.
+ * to play nice with testing headers via direct inclusion from the command line.
+ */
+struct kvm_page_track_notifier_node {};
 #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-03-17  5:33   ` Yan Zhao
@ 2023-05-04 20:41     ` Sean Christopherson
  2023-05-06  6:35       ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-05-04 20:41 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Mar 17, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote:
> > When shadowing a GTT entry with a 2M page, explicitly verify that the
> > first page pinned by VFIO is a transparent hugepage instead of assuming
> > that page observed by is_2MB_gtt_possible() is the same page pinned by
> > vfio_pin_pages().  E.g. if userspace is doing something funky with the
> > guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
> > and vfio_pin_pages().
> > 
> > This is more of a performance optimization than a bug fix as the check
> > for contiguous struct pages should guard against incorrect mapping (even
> > though assuming struct pages are virtually contiguous is wrong).
> > 
> > The real motivation for explicitly checking for a transparent hugepage
> > after pinning is that it will reduce the risk of introducing a bug in a
> > future fix for a page refcount leak (KVMGT doesn't put the reference
> > acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
> > KVM's gfn_to_pfn() altogether.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 8ae7039b3683..90997cc385b4 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> >  			goto err;
> >  		}
> >  
> > -		if (npage == 0)
> > -			base_page = cur_page;
> > +		if (npage == 0) {
> > +			/*
> > +			 * Bail immediately to avoid unnecessary pinning when
> > +			 * trying to shadow a 2M page and the host page isn't
> > +			 * a transparent hugepage.
> > +			 *
> > +			 * TODO: support other type hugepages, e.g. HugeTLB.
> > +			 */
> > +			if (size == I915_GTT_PAGE_SIZE_2M &&
> > +			    !PageTransHuge(cur_page))
> Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> If a page is not transparent huge, but there are 512 contigous 4K
> pages, I think it's still good to map them in IOMMU in 2M.
> See vfio_pin_map_dma() who does similar things.

I agree that bailing isn't strictly necessary, and processing "blindly" should
Just Work for HugeTLB and other hugepage types.  I was going to argue that it
would be safer to add this and then drop it at the end, but I think that's a
specious argument.  If not checking the page type is unsafe, then the existing
code is buggy, and this changelog literally states that the check for contiguous
pages guards against any such problems.

I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
&& CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
could be virtually contiguous but physically discontiguous.  I suspect I'm being
ridiculously paranoid, but for the efficient cases where pages are guaranteed to
be contiguous, the extra page_to_pfn() checks should be optimized away by the
compiler, i.e. there's no meaningful downside to the paranoia.

TL;DR: My plan is to drop this patch and instead harden the continuity check.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-05-04 19:54       ` Sean Christopherson
@ 2023-05-06  1:08         ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-05-06  1:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, May 04, 2023 at 12:54:40PM -0700, Sean Christopherson wrote:
> On Wed, Mar 15, 2023, Sean Christopherson wrote:
> > On Wed, Mar 15, 2023, Yan Zhao wrote:
> > > On Fri, Mar 10, 2023 at 04:22:51PM -0800, Sean Christopherson wrote:
> > > > Disable the page-track notifier code at compile time if there are no
> > > > external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> > > > now hooks emulated writes directly instead of relying on the page-track
> > > > mechanism.
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  arch/x86/include/asm/kvm_host.h       |  2 ++
> > > >  arch/x86/include/asm/kvm_page_track.h |  2 ++
> > > >  arch/x86/kvm/mmu/page_track.c         |  9 ++++-----
> > > >  arch/x86/kvm/mmu/page_track.h         | 29 +++++++++++++++++++++++----
> > > >  4 files changed, 33 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 1a4225237564..a3423711e403 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1265,7 +1265,9 @@ struct kvm_arch {
> > > >  	 * create an NX huge page (without hanging the guest).
> > > >  	 */
> > > >  	struct list_head possible_nx_huge_pages;
> > > > +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> > > >  	struct kvm_page_track_notifier_head track_notifier_head;
> > > > +#endif
> > > >  	/*
> > > >  	 * Protects marking pages unsync during page faults, as TDP MMU page
> > > >  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> > > > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> > > > index deece45936a5..53c2adb25a07 100644
> > > > --- a/arch/x86/include/asm/kvm_page_track.h
> > > > +++ b/arch/x86/include/asm/kvm_page_track.h
> > > The "#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING" can be moved to the
> > > front of this file?
> > > All the structures are only exposed for external users now.
> > 
> > Huh.  I've no idea why I didn't do that.  IIRC, the entire reason past me wrapped
> > track_notifier_head in an #ifdef was to allow this change in kvm_page_track.h.
> > 
> > I'll do this in the next version unless I discover an edge case I'm overlooking.
> 
> Ah, deja vu.  I tried this first time around, and got yelled at by the kernel test
> robot.  Unsuprisingly, my second attempt yielded the same result :-)
> 
>   HDRTEST drivers/gpu/drm/i915/gvt/gvt.h
> In file included from <command-line>:
> gpu/drivers/gpu/drm/i915/gvt/gvt.h:236:45: error: field ‘track_node’ has incomplete type
>   236 |         struct kvm_page_track_notifier_node track_node;
>       |                                             ^~~~~~~~~~
> 
> The problem is direct header inclusion.  Nothing in the kernel includes gvt.h
> when CONFIG_DRM_I915_GVT=n, but the header include guard tests include headers
> directly on the command line.  I think I'll define a "stub" specifically to play
> nice with this sort of testing.  Guarding the guts of gvt.h with CONFIG_DRM_I915_GVT
> would just propagate the problem, and guarding the node definition in "struct
> intel_vgpu" would be confusing since the guard would be dead code for all intents
> and purposes.
> 
> The obvious alternative would be to leave kvm_page_track_notifier_node outside of
> the #ifdef, but I really want to bury kvm_page_track_notifier_head for KVM's sake,
> and having "head" buried but not "node" would also be weird and confusing.
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 33f087437209..3d040741044b 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -51,6 +51,12 @@ void kvm_page_track_unregister_notifier(struct kvm *kvm,
>  
>  int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
>  int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
> +#else
> +/*
> + * Allow defining a node in a structure even if page tracking is disabled, e.g.
> + * to play nice with testing headers via direct inclusion from the command line.
> + */
> +struct kvm_page_track_notifier_node {};
>  #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
>  
>  #endif
>
Or check CONFIG_KVM_EXTERNAL_WRITE_TRACKING in gvt.h ?
e.g.

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 53a0a42a50db..005cdc4fb66a 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -233,7 +233,9 @@ struct intel_vgpu {
        unsigned long nr_cache_entries;
        struct mutex cache_lock;

+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
        struct kvm_page_track_notifier_node track_node;
+#endif
 #define NR_BKT (1 << 18)
        struct hlist_head ptable[NR_BKT];
 #undef NR_BKT

The justification is that gvt.h can be include without kvmgt, e.g. xengt
previously.
But given currently there's no such case, I'm fine with both way :)


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-05-04 20:41     ` Sean Christopherson
@ 2023-05-06  6:35       ` Yan Zhao
  2023-05-06 10:57         ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-05-06  6:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

> > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > If a page is not transparent huge, but there are 512 contigous 4K
> > pages, I think it's still good to map them in IOMMU in 2M.
> > See vfio_pin_map_dma() who does similar things.
> 
> I agree that bailing isn't strictly necessary, and processing "blindly" should
> Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> would be safer to add this and then drop it at the end, but I think that's a
> specious argument.  If not checking the page type is unsafe, then the existing
> code is buggy, and this changelog literally states that the check for contiguous
> pages guards against any such problems.
> 
> I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> could be virtually contiguous but physically discontiguous.  I suspect I'm being
> ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> be contiguous, the extra page_to_pfn() checks should be optimized away by the
> compiler, i.e. there's no meaningful downside to the paranoia.
To make sure I understand it correctly:
There are 3 conditions:
(1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
(2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
    (Looks this will not happen?)
(3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
    But they have different backends, e.g.
    PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
    belongs to DEVMEM.

I think you mean condition (3) is problematic, am I right?
> 
> TL;DR: My plan is to drop this patch and instead harden the continuity check.

So you want to check page zone?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-05-06  6:35       ` Yan Zhao
@ 2023-05-06 10:57         ` Yan Zhao
  2023-05-08 14:05           ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-05-06 10:57 UTC (permalink / raw)
  To: Sean Christopherson, kvm, intel-gfx, linux-kernel, Zhenyu Wang,
	Ben Gardon, Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Sat, May 06, 2023 at 02:35:41PM +0800, Yan Zhao wrote:
> > > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > > If a page is not transparent huge, but there are 512 contigous 4K
> > > pages, I think it's still good to map them in IOMMU in 2M.
> > > See vfio_pin_map_dma() who does similar things.
> > 
> > I agree that bailing isn't strictly necessary, and processing "blindly" should
> > Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> > would be safer to add this and then drop it at the end, but I think that's a
> > specious argument.  If not checking the page type is unsafe, then the existing
> > code is buggy, and this changelog literally states that the check for contiguous
> > pages guards against any such problems.
> > 
> > I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> > && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> > to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> > could be virtually contiguous but physically discontiguous.  I suspect I'm being
> > ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> > be contiguous, the extra page_to_pfn() checks should be optimized away by the
> > compiler, i.e. there's no meaningful downside to the paranoia.
> To make sure I understand it correctly:
> There are 3 conditions:
> (1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
> (2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
>     (Looks this will not happen?)
> (3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
>     But they have different backends, e.g.
>     PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
>     belongs to DEVMEM.
> 
> I think you mean condition (3) is problematic, am I right?
Oh, I got it now.
You are saying about condition (2), with "CONFIG_SPARSEMEM=y &&
CONFIG_SPARSEMEM_VMEMMAP=n".
Two struct pages are contiguous if one is at one section's tail and another at
another section's head, but the two sections aren't for contiguous PFNs.

> > 
> > TL;DR: My plan is to drop this patch and instead harden the continuity check.
> 
> So you want to check page zone?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-05-04  2:17         ` Yan Zhao
@ 2023-05-08  1:15           ` Yan Zhao
  2023-05-11 22:39             ` Sean Christopherson
  0 siblings, 1 reply; 79+ messages in thread
From: Yan Zhao @ 2023-05-08  1:15 UTC (permalink / raw)
  To: Sean Christopherson, kvm, intel-gfx, linux-kernel, Zhenyu Wang,
	Ben Gardon, Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, May 04, 2023 at 10:17:20AM +0800, Yan Zhao wrote:
> On Wed, May 03, 2023 at 04:16:10PM -0700, Sean Christopherson wrote:
> > Finally getting back to this series...
> > 
> > On Thu, Mar 23, 2023, Yan Zhao wrote:
> > > On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote:
> > > > On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
> > > > ...
> > > > > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> > > > > +{
> > > > > +	struct kvm_memory_slot *slot;
> > > > > +	int idx;
> > > > > +
> > > > > +	idx = srcu_read_lock(&kvm->srcu);
> > > > > +
> > > > > +	slot = gfn_to_memslot(kvm, gfn);
> > > > > +	if (!slot) {
> > > > > +		srcu_read_unlock(&kvm->srcu, idx);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
> > > > There should exist a window for external users to see an invalid slot
> > > > when a slot is about to get deleted/moved.
> > > > (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).
> > > 
> > > Or using
> > >         if (!kvm_is_visible_memslot(slot)) {
> > > 		srcu_read_unlock(&kvm->srcu, idx);
> > > 		return -EINVAL;
> > > 	}
> > 
Hi Sean,
After more thoughts, do you think checking KVM internal memslot is necessary?

slot = gfn_to_memslot(kvm, gfn);
if (!slot || slot->id >= KVM_USER_MEM_SLOTS) {
		srcu_read_unlock(&kvm->srcu, idx);
		return -EINVAL;
}

Do we allow write tracking to APIC access page when APIC-write VM exit
is not desired?

Thanks
Yan

> > Hrm.  If the DELETE/MOVE succeeds, then the funky accounting is ok (by the end
> > of the series) as the tracking disappears on DELETE, KVMGT will reject MOVE, and
> > KVM proper zaps SPTEs and resets accounting on MOVE (account_shadowed() runs under
> > mmu_lock and thus ensures all previous SPTEs are zapped before the "flush" from
> > kvm_arch_flush_shadow_memslot() can run).
> > 
> > If kvm_prepare_memory_region() fails though...
> > 
> > Ah, KVM itself is safe because of the aforementioned kvm_arch_flush_shadow_memslot().
> > Any accounting done on a temporarily invalid memslot will be unwound when the SPTEs
> > are zapped.  So for KVM, ignoring invalid memslots is correct _and necessary_.
> > We could clean that up by having accounted_shadowed() use the @slot from the fault,
> > which would close the window where the fault starts with a valid memslot but then
> > sees an invalid memslot when accounting a new shadow page.  But I don't think there
> > is a bug there.
> > 
> > Right, and DELETE can't actually fail in the current code base, and we've established
> > that MOVE can't possibly work.  So even if this is problematic in theory, there are
> > no _unknown_ bugs, and the known bugs are fixed by the end of the series.
> > 
> > And at the end of the series, KVMGT drops its tracking only when the DELETE is
> > committed.  So I _think_ allowing external trackers to add and remove gfns for
> > write-tracking in an invalid slot is actually desirable/correct.  I'm pretty sure
> > removal should be allowed as that can lead to dangling write-protection in a
> > rollback scenario.   And I can't think of anything that will break (in the kernel)
> > if write-tracking a gfn in an invalid slot is allowed, so I don't see any harm in
> > allowing the extremely theoretical case of KVMGT shadowing a gfn in a to-be-deleted
> > memslot _and_ the deletion being rolled back.
> Yes, you are right!
> I previously thought that
> invalid_slot->arch.gfn_write_track and old->arch.gfn_write_track are
> pointing to different places. But I'm wrong.
> Yes, allowing INVALID slot here is more desired for deletion rolling-back.
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2023-05-06 10:57         ` Yan Zhao
@ 2023-05-08 14:05           ` Sean Christopherson
  0 siblings, 0 replies; 79+ messages in thread
From: Sean Christopherson @ 2023-05-08 14:05 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Sat, May 06, 2023, Yan Zhao wrote:
> On Sat, May 06, 2023 at 02:35:41PM +0800, Yan Zhao wrote:
> > > > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > > > If a page is not transparent huge, but there are 512 contigous 4K
> > > > pages, I think it's still good to map them in IOMMU in 2M.
> > > > See vfio_pin_map_dma() who does similar things.
> > > 
> > > I agree that bailing isn't strictly necessary, and processing "blindly" should
> > > Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> > > would be safer to add this and then drop it at the end, but I think that's a
> > > specious argument.  If not checking the page type is unsafe, then the existing
> > > code is buggy, and this changelog literally states that the check for contiguous
> > > pages guards against any such problems.
> > > 
> > > I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> > > && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> > > to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> > > could be virtually contiguous but physically discontiguous.  I suspect I'm being
> > > ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> > > be contiguous, the extra page_to_pfn() checks should be optimized away by the
> > > compiler, i.e. there's no meaningful downside to the paranoia.
> > To make sure I understand it correctly:
> > There are 3 conditions:
> > (1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
> > (2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
> >     (Looks this will not happen?)
> > (3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
> >     But they have different backends, e.g.
> >     PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
> >     belongs to DEVMEM.
> > 
> > I think you mean condition (3) is problematic, am I right?
> Oh, I got it now.
> You are saying about condition (2), with "CONFIG_SPARSEMEM=y &&
> CONFIG_SPARSEMEM_VMEMMAP=n".
> Two struct pages are contiguous if one is at one section's tail and another at
> another section's head, but the two sections aren't for contiguous PFNs.

Yep, exactly.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-05-08  1:15           ` Yan Zhao
@ 2023-05-11 22:39             ` Sean Christopherson
  2023-05-12  2:58               ` Yan Zhao
  0 siblings, 1 reply; 79+ messages in thread
From: Sean Christopherson @ 2023-05-11 22:39 UTC (permalink / raw)
  To: Yan Zhao
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Mon, May 08, 2023, Yan Zhao wrote:
> On Thu, May 04, 2023 at 10:17:20AM +0800, Yan Zhao wrote:
> > On Wed, May 03, 2023 at 04:16:10PM -0700, Sean Christopherson wrote:
> > > Finally getting back to this series...
> > > 
> > > On Thu, Mar 23, 2023, Yan Zhao wrote:
> > > > On Fri, Mar 17, 2023 at 04:28:56PM +0800, Yan Zhao wrote:
> > > > > On Fri, Mar 10, 2023 at 04:22:56PM -0800, Sean Christopherson wrote:
> > > > > ...
> > > > > > +int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
> > > > > > +{
> > > > > > +	struct kvm_memory_slot *slot;
> > > > > > +	int idx;
> > > > > > +
> > > > > > +	idx = srcu_read_lock(&kvm->srcu);
> > > > > > +
> > > > > > +	slot = gfn_to_memslot(kvm, gfn);
> > > > > > +	if (!slot) {
> > > > > > +		srcu_read_unlock(&kvm->srcu, idx);
> > > > > > +		return -EINVAL;
> > > > > > +	}
> > > > > > +
> > > > > Also fail if slot->flags & KVM_MEMSLOT_INVALID is true?
> > > > > There should exist a window for external users to see an invalid slot
> > > > > when a slot is about to get deleted/moved.
> > > > > (It happens before MOVE is rejected in kvm_arch_prepare_memory_region()).
> > > > 
> > > > Or using
> > > >         if (!kvm_is_visible_memslot(slot)) {
> > > > 		srcu_read_unlock(&kvm->srcu, idx);
> > > > 		return -EINVAL;
> > > > 	}
> > > 
> Hi Sean,
> After more thoughts, do you think checking KVM internal memslot is necessary?

I don't think it's necessary per se, but I also can't think of any reason to allow
it.

> slot = gfn_to_memslot(kvm, gfn);
> if (!slot || slot->id >= KVM_USER_MEM_SLOTS) {
> 		srcu_read_unlock(&kvm->srcu, idx);
> 		return -EINVAL;
> }
> 
> Do we allow write tracking to APIC access page when APIC-write VM exit
> is not desired?

Allow?  Yes.
 
But KVM doesn't use write-tracking for anything APICv related, e.g. to disable
APICv, KVM instead zaps the SPTEs for the APIC access page and on page fault goes
straight to MMIO emulation.

Theoretically, the guest could create an intermediate PTE in the APIC access page
and AFAICT KVM would shadow the access and write-protect the APIC access page.
But that's benign as the resulting emulation would be handled just like emulated
APIC MMIO.

FWIW, the other internal memslots, TSS and idenity mapped page tables, are used
if and only if paging is disabled in the guest, i.e. there are no guest PTEs for
KVM to shadow (and paging must be enabled to enable VMX, so nested EPT is also
ruled out).  So this is theoretically possible only for the APIC access page.
That changes with KVMGT, but that again should not be problematic.  KVM will
emulate in response to the write-protected page and things go on.  E.g. it's
arguably much weirder that the guest can read/write the identity mapped page
tables that are used for EPT without unrestricted guest.

There's no sane reason to allow creating PTEs in the APIC page, but I'm also not
all that motivated to "fix" things.   account_shadowed() isn't expected to fail,
so KVM would need to check further up the stack, e.g. in walk_addr_generic() by
open coding a form of kvm_vcpu_gfn_to_hva_prot().

I _think_ that's the only place KVM would need to add a check, as KVM already
checks that the root, i.e. CR3, is in a "visible" memslot.  I suppose KVM could
just synthesize triple fault, like it does for the root/CR3 case, but I don't
like making up behavior.

In other words, I'm not opposed to disallowing write-tracking internal memslots,
but I can't think of anything that will break, and so for me personally at least,
the ROI isn't sufficient to justify writing tests and dealing with any fallout.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2023-05-11 22:39             ` Sean Christopherson
@ 2023-05-12  2:58               ` Yan Zhao
  0 siblings, 0 replies; 79+ messages in thread
From: Yan Zhao @ 2023-05-12  2:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

> > Hi Sean,
> > After more thoughts, do you think checking KVM internal memslot is necessary?
> 
> I don't think it's necessary per se, but I also can't think of any reason to allow
> it.
> 
> > slot = gfn_to_memslot(kvm, gfn);
> > if (!slot || slot->id >= KVM_USER_MEM_SLOTS) {
> > 		srcu_read_unlock(&kvm->srcu, idx);
> > 		return -EINVAL;
> > }
> > 
> > Do we allow write tracking to APIC access page when APIC-write VM exit
> > is not desired?
> 
> Allow?  Yes.
>  
> But KVM doesn't use write-tracking for anything APICv related, e.g. to disable
> APICv, KVM instead zaps the SPTEs for the APIC access page and on page fault goes
> straight to MMIO emulation.
> 
> Theoretically, the guest could create an intermediate PTE in the APIC access page
> and AFAICT KVM would shadow the access and write-protect the APIC access page.
> But that's benign as the resulting emulation would be handled just like emulated
> APIC MMIO.
> 
> FWIW, the other internal memslots, TSS and idenity mapped page tables, are used
> if and only if paging is disabled in the guest, i.e. there are no guest PTEs for
> KVM to shadow (and paging must be enabled to enable VMX, so nested EPT is also
> ruled out).  So this is theoretically possible only for the APIC access page.
> That changes with KVMGT, but that again should not be problematic.  KVM will
> emulate in response to the write-protected page and things go on.  E.g. it's
> arguably much weirder that the guest can read/write the identity mapped page
> tables that are used for EPT without unrestricted guest.
> 
> There's no sane reason to allow creating PTEs in the APIC page, but I'm also not
> all that motivated to "fix" things.   account_shadowed() isn't expected to fail,
> so KVM would need to check further up the stack, e.g. in walk_addr_generic() by
> open coding a form of kvm_vcpu_gfn_to_hva_prot().
> 
> I _think_ that's the only place KVM would need to add a check, as KVM already
> checks that the root, i.e. CR3, is in a "visible" memslot.  I suppose KVM could
> just synthesize triple fault, like it does for the root/CR3 case, but I don't
> like making up behavior.
> 
> In other words, I'm not opposed to disallowing write-tracking internal memslots,
> but I can't think of anything that will break, and so for me personally at least,
> the ROI isn't sufficient to justify writing tests and dealing with any fallout.

It makes sense. Thanks for the explanation.


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2023-05-12  3:23 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-11  0:22 [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
2023-03-13 15:37   ` Wang, Wei W
2023-03-15 18:13     ` [Intel-gfx] " Andrzej Hajda
2023-03-15 19:23       ` Sean Christopherson
2023-03-17  4:20   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
2023-03-13 15:37   ` Wang, Wei W
2023-03-11  0:22 ` [PATCH v2 03/27] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn Sean Christopherson
2023-03-17  4:26   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
2023-03-14  3:09   ` Yan Zhao
2023-03-14 17:13     ` Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
2023-03-17  5:33   ` Yan Zhao
2023-05-04 20:41     ` Sean Christopherson
2023-05-06  6:35       ` Yan Zhao
2023-05-06 10:57         ` Yan Zhao
2023-05-08 14:05           ` Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 06/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
2023-03-17  6:18   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 07/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
2023-03-17  5:37   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 08/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
2023-03-17  6:19   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 09/27] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt() Sean Christopherson
2023-03-17  6:20   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 10/27] drm/i915/gvt: Protect gfn hash table with vgpu_lock Sean Christopherson
2023-03-17  6:21   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 11/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
2023-03-15  1:08   ` Yan Zhao
2023-03-15 15:32     ` Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 12/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
2023-03-17  6:37   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 13/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
2023-03-17  7:28   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 14/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
2023-03-15  8:03   ` Yan Zhao
2023-03-15 15:43     ` Sean Christopherson
2023-03-16  9:27       ` Yan Zhao
2023-03-17  7:29   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 15/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
2023-03-17  7:30   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 16/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
2023-03-17  7:43   ` Yan Zhao
2023-03-17 16:20     ` Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 17/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
2023-03-17  7:45   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 18/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 19/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
2023-03-15  8:44   ` Yan Zhao
2023-03-15 15:13     ` Sean Christopherson
2023-03-16  9:19       ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 20/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
2023-03-15  9:34   ` Yan Zhao
2023-03-15 16:21     ` Sean Christopherson
2023-03-16  9:29       ` Yan Zhao
2023-03-15 10:36   ` Yan Zhao
2023-03-15 16:54     ` Sean Christopherson
2023-05-04 19:54       ` Sean Christopherson
2023-05-06  1:08         ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 21/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 22/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 23/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
2023-03-17  7:55   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 24/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
2023-03-11  0:22 ` [PATCH v2 25/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
2023-03-17  8:28   ` Yan Zhao
2023-03-23  8:50     ` Yan Zhao
2023-05-03 23:16       ` Sean Christopherson
2023-05-04  2:17         ` Yan Zhao
2023-05-08  1:15           ` Yan Zhao
2023-05-11 22:39             ` Sean Christopherson
2023-05-12  2:58               ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 26/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
2023-03-17  8:52   ` Yan Zhao
2023-03-11  0:22 ` [PATCH v2 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
2023-03-17  8:58   ` Yan Zhao
2023-03-13  9:58 ` [PATCH v2 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).