kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
@ 2022-12-23  0:57 Sean Christopherson
  2022-12-23  0:57 ` [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
                   ` (27 more replies)
  0 siblings, 28 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
page-track APIs to provide a leaner and cleaner interface.  The motivation
for this series is to (significantly) reduce the number of KVM APIs that
KVMGT uses, with a long-term goal of making all kvm_host.h headers
KVM-internal.  That said, I think the cleanup itself is worthwhile,
e.g. KVMGT really shouldn't be touching kvm->mmu_lock.

Note!  The KVMGT changes are compile tested only as I don't have the
necessary hardware (AFAIK).  Testing, and lots of it, on the KVMGT side
of things is needed and any help on that front would be much appreciated.

Sean Christopherson (24):
  drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  KVM: x86/mmu: Factor out helper to get max mapping size of a memslot
  drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT
    entry
  drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt
    entry
  drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
    GTT
  drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  drm/i915/gvt: Hoist acquisition of vgpu_lock out to
    kvmgt_page_track_write()
  drm/i915/gvt: Protect gfn hash table with dedicated mutex
  KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
    change
  KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
    slot
  KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  KVM: x86/mmu: Use page-track notifiers iff there are external users
  KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  KVM: x86/mmu: Assert that correct locks are held for page
    write-tracking
  KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  KVM: x86/mmu: Add page-track API to query if a gfn is valid
  drm/i915/gvt: Drop final dependencies on KVM internal details

Yan Zhao (3):
  KVM: x86: Add a new page-track hook to handle memslot deletion
  drm/i915/gvt: switch from ->track_flush_slot() to
    ->track_remove_region()
  KVM: x86: Remove the unused page-track hook track_flush_slot()

 arch/x86/include/asm/kvm_host.h       |  16 +-
 arch/x86/include/asm/kvm_page_track.h |  67 +++---
 arch/x86/kvm/mmu.h                    |   2 +
 arch/x86/kvm/mmu/mmu.c                |  61 +++---
 arch/x86/kvm/mmu/mmu_internal.h       |   2 +
 arch/x86/kvm/mmu/page_track.c         | 283 +++++++++++++++-----------
 arch/x86/kvm/mmu/page_track.h         |  59 ++++++
 arch/x86/kvm/x86.c                    |  13 +-
 drivers/gpu/drm/i915/gvt/gtt.c        |  45 ++--
 drivers/gpu/drm/i915/gvt/gvt.h        |   4 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 138 ++++++-------
 drivers/gpu/drm/i915/gvt/page_track.c |  10 +-
 drivers/gpu/drm/i915/gvt/vgpu.c       |   1 +
 13 files changed, 386 insertions(+), 315 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h


base-commit: 9d75a3251adfbcf444681474511b58042a364863
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Check that the pfn found by gfn_to_pfn() is actually backed by "struct
page" memory prior to retrieving and dereferencing the page.  KVM
supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
there is no guarantee the pfn returned by gfn_to_pfn() has an associated
"struct page".

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index ce0eb03709c3..d0fca53a3563 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1188,6 +1188,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
+
+	if (!pfn_valid(pfn))
+		return -EINVAL;
+
 	return PageTransHuge(pfn_to_page(pfn));
 }
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
  2022-12-23  0:57 ` [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Extract the memslot-related logic of kvm_mmu_max_mapping_level() into a
new helper so that KVMGT can determine whether or not mapping a 2MiB page
into the guest is (dis)allowed per KVM's memslots.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 21 +++++++++++++++------
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..ca7428b68eba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3064,20 +3064,29 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return level;
 }
 
+int kvm_mmu_max_slot_mapping_level(const struct kvm_memory_slot *slot,
+				   gfn_t gfn, int max_level)
+{
+	struct kvm_lpage_info *linfo;
+
+	for ( ; max_level > PG_LEVEL_4K; max_level--) {
+		linfo = lpage_info_slot(gfn, slot, max_level);
+		if (!linfo->disallow_lpage)
+			break;
+	}
+	return max_level;
+}
+
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
 			      int max_level)
 {
-	struct kvm_lpage_info *linfo;
 	int host_level;
 
 	max_level = min(max_level, max_huge_page_level);
-	for ( ; max_level > PG_LEVEL_4K; max_level--) {
-		linfo = lpage_info_slot(gfn, slot, max_level);
-		if (!linfo->disallow_lpage)
-			break;
-	}
+	max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
 
+	/* Avoid walking the host page tables if a hugepage is impossible. */
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ac00bfbf32f6..b078c29e5674 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -314,6 +314,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	return r;
 }
 
+int kvm_mmu_max_slot_mapping_level(const struct kvm_memory_slot *slot,
+				   gfn_t gfn, int max_level);
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
 			      int max_level);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
  2022-12-23  0:57 ` [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
  2022-12-23  0:57 ` [PATCH 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-28  5:42   ` Yan Zhao
  2022-12-23  0:57 ` [PATCH 04/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Honor KVM's max allowed page size when determining whether or not a 2MiB
GTT shadow page can be created for the guest.  Querying KVM's max allowed
size is somewhat odd as there's no strict requirement that KVM's memslots
and VFIO's mappings are configured with the same gfn=>hva mapping, but
the check will be accurate if userspace wants to have a functional guest,
and at the very least checking KVM's memslots guarantees that the entire
2MiB range has been exposed to the guest.

Note, KVM may also restrict the mapping size for reasons that aren't
relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
is write-tracked (KVM's write-tracking only handles writes from vCPUs).
However, such scenarios are unlikely to occur with a well-behaved guest,
and at worst will result in sub-optimal performance.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++++++++
 drivers/gpu/drm/i915/gvt/gtt.c        | 10 +++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..3f72c7a172fc 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -51,6 +51,8 @@ void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level);
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
 int kvm_page_track_create_memslot(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2e09d1b6249f..69ea16c31859 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -300,3 +300,21 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 			n->track_flush_slot(kvm, slot, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 }
+
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
+		max_level = PG_LEVEL_4K;
+	else
+		max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return max_level;
+}
+EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index d0fca53a3563..6736d7bd94ea 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1178,14 +1178,22 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	struct intel_gvt_gtt_entry *entry)
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
+	unsigned long gfn = ops->get_pfn(entry);
 	kvm_pfn_t pfn;
+	int max_level;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
 
 	if (!vgpu->attached)
 		return -EINVAL;
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
+
+	max_level = kvm_page_track_max_mapping_level(vgpu->vfio_device.kvm,
+						     gfn, PG_LEVEL_2M);
+	if (max_level < PG_LEVEL_2M)
+		return 0;
+
+	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
 	if (is_error_noslot_pfn(pfn))
 		return -EINVAL;
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 05/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When shadowing a GTT entry with a 2M page, explicitly verify that the
first page pinned by VFIO is a transparent hugepage instead of assuming
that page observed by is_2MB_gtt_possible() is the same page pinned by
vfio_pin_pages().  E.g. if userspace is doing something funky with the
guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
and vfio_pin_pages().

This is more of a performance optimization than a bug fix as the check
for contiguous struct pages should guard against incorrect mapping (even
though assuming struct pages are virtually contiguous is wrong).

The real motivation for explicitly checking for a transparent hugepage
after pinning is that it will reduce the risk of introducing a bug in a
future fix for a page refcount leak (KVMGT doesn't put the reference
acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
KVM's gfn_to_pfn() altogether.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 714221f9a131..6f358b4fe406 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 			goto err;
 		}
 
-		if (npage == 0)
-			base_page = cur_page;
+		if (npage == 0) {
+			/*
+			 * Bail immediately to avoid unnecessary pinning when
+			 * trying to shadow a 2M page and the host page isn't
+			 * a transparent hugepage.
+			 *
+			 * TODO: support other type hugepages, e.g. HugeTLB.
+			 */
+			if (size == I915_GTT_PAGE_SIZE_2M &&
+			    !PageTransHuge(cur_page))
+				ret = -EIO;
+			else
+				base_page = cur_page;
+		}
 		else if (base_page + npage != cur_page) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
+		}
+		if (ret < 0) {
 			npage++;
 			goto err;
 		}
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 04/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 06/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that
the caller is ultimately responsible for dropping any reference.

Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted
struct page before trying to put any references.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 6736d7bd94ea..9936f8bd19af 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1181,6 +1181,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	unsigned long gfn = ops->get_pfn(entry);
 	kvm_pfn_t pfn;
 	int max_level;
+	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1200,7 +1201,9 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (!pfn_valid(pfn))
 		return -EINVAL;
 
-	return PageTransHuge(pfn_to_page(pfn));
+	ret = PageTransHuge(pfn_to_page(pfn));
+	kvm_release_pfn_clean(pfn);
+	return ret;
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 05/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 07/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Now that gvt_pin_guest_page() explicitly verifies the pinned PFN is a
transparent hugepage page, don't use KVM's gfn_to_pfn() to pre-check if a
2M GTT entry is possible and instead just try to map the GFN with a 2MB
entry.  Using KVM to query pfn that is ultimately managed through VFIO is
odd, and KVM's gfn_to_pfn() is not intended for non-KVM consumption; it's
exported only because of KVM vendor modules (x86 and PPC).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 9936f8bd19af..59ba6639e622 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1167,21 +1167,19 @@ static inline void ppgtt_generate_shadow_entry(struct intel_gvt_gtt_entry *se,
 }
 
 /*
- * Check if can do 2M page
+ * Try to map a 2M gtt entry.
  * @vgpu: target vgpu
  * @entry: target pfn's gtt entry
  *
- * Return 1 if 2MB huge gtt shadowing is possible, 0 if miscondition,
- * negative if found err.
+ * Return 1 if 2MB huge gtt shadow was creation, 0 if the entry needs to be
+ * split, negative if found err.
  */
-static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
-	struct intel_gvt_gtt_entry *entry)
+static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu,
+	struct intel_gvt_gtt_entry *entry, dma_addr_t *dma_addr)
 {
 	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
 	unsigned long gfn = ops->get_pfn(entry);
-	kvm_pfn_t pfn;
 	int max_level;
-	int ret;
 
 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
 		return 0;
@@ -1194,16 +1192,7 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
 	if (max_level < PG_LEVEL_2M)
 		return 0;
 
-	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
-	if (is_error_noslot_pfn(pfn))
-		return -EINVAL;
-
-	if (!pfn_valid(pfn))
-		return -EINVAL;
-
-	ret = PageTransHuge(pfn_to_page(pfn));
-	kvm_release_pfn_clean(pfn);
-	return ret;
+	return intel_gvt_dma_map_guest_page(vgpu, gfn, I915_GTT_PAGE_SIZE_2M, dma_addr);
 }
 
 static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
@@ -1290,7 +1279,7 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 {
 	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
 	struct intel_gvt_gtt_entry se = *ge;
-	unsigned long gfn, page_size = PAGE_SIZE;
+	unsigned long gfn;
 	dma_addr_t dma_addr;
 	int ret;
 
@@ -1313,13 +1302,12 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 		return split_64KB_gtt_entry(vgpu, spt, index, &se);
 	case GTT_TYPE_PPGTT_PTE_2M_ENTRY:
 		gvt_vdbg_mm("shadow 2M gtt entry\n");
-		ret = is_2MB_gtt_possible(vgpu, ge);
+		ret = try_map_2MB_gtt_entry(vgpu, ge, &dma_addr);
 		if (ret == 0)
 			return split_2MB_gtt_entry(vgpu, spt, index, &se);
 		else if (ret < 0)
 			return ret;
-		page_size = I915_GTT_PAGE_SIZE_2M;
-		break;
+		goto set_shadow_entry;
 	case GTT_TYPE_PPGTT_PTE_1G_ENTRY:
 		gvt_vgpu_err("GVT doesn't support 1GB entry\n");
 		return -EINVAL;
@@ -1328,10 +1316,11 @@ static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
 	}
 
 	/* direct shadow */
-	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, page_size, &dma_addr);
+	ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE, &dma_addr);
 	if (ret)
 		return -ENXIO;
 
+set_shadow_entry:
 	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
 	ppgtt_set_shadow_entry(spt, &se, index);
 	return 0;
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 06/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 08/27] drm/i915/gvt: Hoist acquisition of vgpu_lock out to kvmgt_page_track_write() Sean Christopherson
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Use an "unsigned long" instead of an "int" when iterating over the gfns
in a memslot.  The number of pages in the memslot is tracked as an
"unsigned long", e.g. KVMGT could theoretically break if a KVM memslot
larger than 16TiB were deleted (2^32 * 4KiB).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 6f358b4fe406..5d0e029d60d7 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1635,7 +1635,7 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node)
 {
-	int i;
+	unsigned long i;
 	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/27] drm/i915/gvt: Hoist acquisition of vgpu_lock out to kvmgt_page_track_write()
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 07/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex Sean Christopherson
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Host the acquisition of vgpu_lock from intel_vgpu_page_track_handler() out
to its sole caller, kvmgt_page_track_write().  An upcoming fix will add a
mutex to protect the gfn hash table that referenced by
kvmgt_gfn_is_write_protected(), i.e. kvmgt_page_track_write() will need to
acquire another lock.  Conceptually, the to-be-introduced gfn_lock has
finer granularity than vgpu_lock and so the lock order should ideally be
vgpu_lock => gfn_lock, e.g. to avoid potential lock inversion elsewhere in
KVMGT.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c      |  4 ++++
 drivers/gpu/drm/i915/gvt/page_track.c | 10 ++--------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 5d0e029d60d7..ca9926061cd8 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1626,9 +1626,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
+	mutex_lock(&info->vgpu_lock);
+
 	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
+
+	mutex_unlock(&info->vgpu_lock);
 }
 
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
diff --git a/drivers/gpu/drm/i915/gvt/page_track.c b/drivers/gpu/drm/i915/gvt/page_track.c
index 3375b51c75f1..6d72d11914a5 100644
--- a/drivers/gpu/drm/i915/gvt/page_track.c
+++ b/drivers/gpu/drm/i915/gvt/page_track.c
@@ -162,13 +162,9 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 	struct intel_vgpu_page_track *page_track;
 	int ret = 0;
 
-	mutex_lock(&vgpu->vgpu_lock);
-
 	page_track = intel_vgpu_find_page_track(vgpu, gpa >> PAGE_SHIFT);
-	if (!page_track) {
-		ret = -ENXIO;
-		goto out;
-	}
+	if (!page_track)
+		return -ENXIO;
 
 	if (unlikely(vgpu->failsafe)) {
 		/* Remove write protection to prevent furture traps. */
@@ -179,7 +175,5 @@ int intel_vgpu_page_track_handler(struct intel_vgpu *vgpu, u64 gpa,
 			gvt_err("guest page write error, gpa %llx\n", gpa);
 	}
 
-out:
-	mutex_unlock(&vgpu->vgpu_lock);
 	return ret;
 }
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 08/27] drm/i915/gvt: Hoist acquisition of vgpu_lock out to kvmgt_page_track_write() Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-28  5:03   ` Yan Zhao
  2022-12-23  0:57 ` [PATCH 10/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Add and use a new mutex, gfn_lock, to protect accesses to the hash table
used to track which gfns are write-protected when shadowing the guest's
GTT.  This fixes a bug where kvmgt_page_track_write(), which doesn't hold
kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
a use-after-free.

Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
sleep when acquiring vgpu->cache_lock deep down the callstack:

  intel_vgpu_page_track_handler()
  |
  |->  page_track->handler / ppgtt_write_protection_handler()
       |
       |-> ppgtt_handle_guest_write_page_table_bytes()
           |
           |->  ppgtt_handle_guest_write_page_table()
                |
                |-> ppgtt_handle_guest_entry_removal()
                    |
                    |-> ppgtt_invalidate_pte()
                        |
                        |-> intel_gvt_dma_unmap_guest_page()
                            |
                            |-> mutex_lock(&vgpu->cache_lock);

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   |  1 +
 drivers/gpu/drm/i915/gvt/kvmgt.c | 65 ++++++++++++++++++++------------
 drivers/gpu/drm/i915/gvt/vgpu.c  |  1 +
 3 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index dbf8d7470b2c..fbfd7eafec14 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -176,6 +176,7 @@ struct intel_vgpu {
 	struct vfio_device vfio_device;
 	struct intel_gvt *gvt;
 	struct mutex vgpu_lock;
+	struct mutex gfn_lock;
 	int id;
 	bool active;
 	bool attached;
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index ca9926061cd8..a4747e153dad 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -366,6 +366,8 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
 {
 	struct kvmgt_pgfn *p, *res = NULL;
 
+	lockdep_assert_held(&info->gfn_lock);
+
 	hash_for_each_possible(info->ptable, p, hnode, gfn) {
 		if (gfn == p->gfn) {
 			res = p;
@@ -388,6 +390,8 @@ static void kvmgt_protect_table_add(struct intel_vgpu *info, gfn_t gfn)
 {
 	struct kvmgt_pgfn *p;
 
+	lockdep_assert_held(&info->gfn_lock);
+
 	if (kvmgt_gfn_is_write_protected(info, gfn))
 		return;
 
@@ -1563,60 +1567,68 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 {
 	struct kvm *kvm = info->vfio_device.kvm;
 	struct kvm_memory_slot *slot;
-	int idx;
+	int idx, ret = 0;
 
 	if (!info->attached)
 		return -ESRCH;
 
+	mutex_lock(&info->gfn_lock);
+
+	if (kvmgt_gfn_is_write_protected(info, gfn))
+		goto out;
+
 	idx = srcu_read_lock(&kvm->srcu);
 	slot = gfn_to_memslot(kvm, gfn);
 	if (!slot) {
 		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
 	write_lock(&kvm->mmu_lock);
-
-	if (kvmgt_gfn_is_write_protected(info, gfn))
-		goto out;
-
 	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_add(info, gfn);
-
 out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
-	return 0;
+	mutex_unlock(&info->gfn_lock);
+	return ret;
 }
 
 int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 {
 	struct kvm *kvm = info->vfio_device.kvm;
 	struct kvm_memory_slot *slot;
-	int idx;
+	int idx, ret = 0;
 
 	if (!info->attached)
 		return 0;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		return -EINVAL;
-	}
-
-	write_lock(&kvm->mmu_lock);
+	mutex_lock(&info->gfn_lock);
 
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
 		goto out;
 
+	idx = srcu_read_lock(&kvm->srcu);
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	write_lock(&kvm->mmu_lock);
 	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	write_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
+
 	kvmgt_protect_table_del(info, gfn);
 
 out:
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
-	return 0;
+	mutex_unlock(&info->gfn_lock);
+	return ret;
 }
 
 static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -1627,11 +1639,13 @@ static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		container_of(node, struct intel_vgpu, track_node);
 
 	mutex_lock(&info->vgpu_lock);
+	mutex_lock(&info->gfn_lock);
 
 	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
 
+	mutex_unlock(&info->gfn_lock);
 	mutex_unlock(&info->vgpu_lock);
 }
 
@@ -1644,16 +1658,19 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
-	write_lock(&kvm->mmu_lock);
+	mutex_lock(&info->gfn_lock);
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
 		if (kvmgt_gfn_is_write_protected(info, gfn)) {
+			write_lock(&kvm->mmu_lock);
 			kvm_slot_page_track_remove_page(kvm, slot, gfn,
 						KVM_PAGE_TRACK_WRITE);
+			write_unlock(&kvm->mmu_lock);
+
 			kvmgt_protect_table_del(info, gfn);
 		}
 	}
-	write_unlock(&kvm->mmu_lock);
+	mutex_unlock(&info->gfn_lock);
 }
 
 void intel_vgpu_detach_regions(struct intel_vgpu *vgpu)
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 56c71474008a..f2479781b770 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -277,6 +277,7 @@ struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt)
 	vgpu->id = IDLE_VGPU_IDR;
 	vgpu->gvt = gvt;
 	mutex_init(&vgpu->vgpu_lock);
+	mutex_init(&vgpu->gfn_lock);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++)
 		INIT_LIST_HEAD(&vgpu->submission.workload_q_head[i]);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (8 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 11/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
bounding through the page-track mechanism.  KVM (unfortunately) needs to
zap and flush all page tables on memslot DELETE/MOVE irrespective of
whether KVM is shadowing guest page tables.

This will allow changing KVM to register a page-track notifier on the
first shadow root allocation, and will also allow deleting the misguided
kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
different method for reacting to memslot changes.

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221110014821.1548347-2-seanjc@google.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 10 +---------
 arch/x86/kvm/x86.c              |  2 ++
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..fcb042f971ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1798,6 +1798,7 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_mmu_zap_all_fast(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ca7428b68eba..8c3a453554ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6009,7 +6009,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
  * not use any resource of the being-deleted slot or all slots
  * after calling the function.
  */
-static void kvm_mmu_zap_all_fast(struct kvm *kvm)
+void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
 	lockdep_assert_held(&kvm->slots_lock);
 
@@ -6065,13 +6065,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
 }
 
-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
-			struct kvm_memory_slot *slot,
-			struct kvm_page_track_notifier_node *node)
-{
-	kvm_mmu_zap_all_fast(kvm);
-}
-
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
@@ -6089,7 +6082,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	}
 
 	node->track_write = kvm_mmu_pte_write;
-	node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
 
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 312aea1854ae..af0d83e33bc4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12599,6 +12599,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
+	kvm_mmu_zap_all_fast(kvm);
+
 	kvm_page_track_flush_slot(kvm, slot);
 }
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (9 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 10/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 12/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Don't use the generic page-track mechanism to handle writes to guest PTEs
in KVM's MMU.  KVM's MMU needs access to information that should not be
exposed to external page-track users, e.g. KVM needs (for some definitions
of "need") the vCPU to query the current paging mode, whereas external
users, i.e. KVMGT, have no ties to the current vCPU and so should never
need the vCPU.

Moving away from the page-track mechanism will allow dropping use of the
page-track mechanism for KVM's own MMU, and will also allow simplifying
and cleaning up the page-track APIs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/mmu/mmu.c          | 13 ++-----------
 arch/x86/kvm/mmu/page_track.c   |  2 ++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fcb042f971ee..eec424fac0ba 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1223,7 +1223,6 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
-	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..b8bde42f6037 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -119,6 +119,8 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_free_obsolete_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu);
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes);
 
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8c3a453554ed..dfeddea8148a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5582,9 +5582,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-			      const u8 *new, int bytes,
-			      struct kvm_page_track_notifier_node *node)
+void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			 int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -6067,7 +6066,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
 
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
 	int r;
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
@@ -6081,9 +6079,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 			return r;
 	}
 
-	node->track_write = kvm_mmu_pte_write;
-	kvm_page_track_register_notifier(kvm, node);
-
 	kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero = __GFP_ZERO;
 
@@ -6104,10 +6099,6 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
 
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
-	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
-
-	kvm_page_track_unregister_notifier(kvm, node);
-
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 69ea16c31859..407128bcabc8 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -273,6 +273,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(vcpu, gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (10 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 11/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 13/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  5 ++---
 arch/x86/kvm/mmu/page_track.c         |  2 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 10 ++++------
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 3f72c7a172fc..0d65ae203fd6 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
 	 * It is called when guest is writing the write-tracked page
 	 * and write emulation is finished at that time.
 	 *
-	 * @vcpu: the vcpu where the write access happened.
 	 * @gpa: the physical address written by guest.
 	 * @new: the data was written to the address.
 	 * @bytes: the written length.
 	 * @node: this node
 	 */
-	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			    int bytes, struct kvm_page_track_notifier_node *node);
+	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
+			    struct kvm_page_track_notifier_node *node);
 	/*
 	 * It is called when memory slot is being moved or removed
 	 * users can drop write-protection for the pages in that memory slot
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 407128bcabc8..32357599cb09 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -271,7 +271,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
 				srcu_read_lock_held(&head->track_srcu))
 		if (n->track_write)
-			n->track_write(vcpu, gpa, new, bytes, n);
+			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
 
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index a4747e153dad..5ff17a212107 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -106,9 +106,8 @@ struct gvt_dma {
 #define vfio_dev_to_vgpu(vfio_dev) \
 	container_of((vfio_dev), struct intel_vgpu, vfio_device)
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node);
 static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 		struct kvm_memory_slot *slot,
 		struct kvm_page_track_notifier_node *node);
@@ -1631,9 +1630,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	return ret;
 }
 
-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		const u8 *val, int len,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+				   struct kvm_page_track_notifier_node *node)
 {
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (11 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 12/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 14/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disallow moving memslots if the VM has external page-track users, i.e. if
KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't
correctly handle moving memory regions.

Note, this is potential ABI breakage!  E.g. userspace could move regions
that aren't shadowed by KVMGT without harming the guest.  However, the
only known user of KVMGT is QEMU, and QEMU doesn't move generic memory
regions.  KVM's own support for moving memory regions was also broken for
multiple years (albeit for an edge case, but arguably moving RAM is
itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate
new rmap and large page tracking when moving memslot").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 3 +++
 arch/x86/kvm/mmu/page_track.c         | 5 +++++
 arch/x86/kvm/x86.c                    | 7 +++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 0d65ae203fd6..6a287bcbe8a9 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 32357599cb09..c474a0ff24ba 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -320,3 +320,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+
+bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af0d83e33bc4..b587858e878e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12419,6 +12419,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 				   struct kvm_memory_slot *new,
 				   enum kvm_mr_change change)
 {
+	/*
+	 * KVM doesn't support moving memslots when there are external page
+	 * trackers attached to the VM, i.e. if KVMGT is in use.
+	 */
+	if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm))
+		return -EINVAL;
+
 	if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) {
 		if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn())
 			return -EINVAL;
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (12 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 13/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 15/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When handling a slot "flush", don't call back into KVM to drop write
protection for gfns in the slot.  Now that KVM rejects attempts to move
memory slots while KVMGT is attached, the only time a slot is "flushed"
is when it's being removed, i.e. the memslot and all its write-tracking
metadata is about to be deleted.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 5ff17a212107..3c59e7cd75d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1659,14 +1659,8 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
 	mutex_lock(&info->gfn_lock);
 	for (i = 0; i < slot->npages; i++) {
 		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn)) {
-			write_lock(&kvm->mmu_lock);
-			kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						KVM_PAGE_TRACK_WRITE);
-			write_unlock(&kvm->mmu_lock);
-
+		if (kvmgt_gfn_is_write_protected(info, gfn))
 			kvmgt_protect_table_del(info, gfn);
-		}
 	}
 	mutex_unlock(&info->gfn_lock);
 }
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/27] KVM: x86: Add a new page-track hook to handle memslot deletion
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (13 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 14/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 16/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Add a new page-track hook, track_remove_region(), that is called when a
memslot DELETE operation is about to be committed.  The "remove" hook
will be used by KVMGT and will effectively replace the existing
track_flush_slot() altogether now that KVM itself doesn't rely on the
"flush" hook either.

The "flush" hook is flawed as it's invoked before the memslot operation
is guaranteed to succeed, i.e. KVM might ultimately keep the existing
memslot without notifying external page track users, a.k.a. KVMGT.  In
practice, this can't currently happen on x86, but there are no guarantees
that won't change in the future, not to mention that "flush" does a very
poor job of describing what is happening.

Pass in the gfn+nr_pages instead of the slot itself so external users,
i.e. KVMGT, don't need to exposed to KVM internals (memslots).  This will
help set the stage for additional cleanups to the page-track APIs.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 12 ++++++++++++
 arch/x86/kvm/mmu/page_track.c         | 23 +++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  3 +++
 3 files changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 6a287bcbe8a9..152c5e7d7868 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,6 +43,17 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
 			    struct kvm_page_track_notifier_node *node);
+
+	/*
+	 * Invoked when a memory region is removed from the guest.  Or in KVM
+	 * terms, when a memslot is deleted.
+	 *
+	 * @gfn:       base gfn of the region being removed
+	 * @nr_pages:  number of pages in the to-be-removed region
+	 * @node:      this node
+	 */
+	void (*track_remove_region)(gfn_t gfn, unsigned long nr_pages,
+				    struct kvm_page_track_notifier_node *node);
 };
 
 int kvm_page_track_init(struct kvm *kvm);
@@ -77,6 +88,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
 void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index c474a0ff24ba..959be672e2ad 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -303,6 +303,29 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
 	srcu_read_unlock(&head->track_srcu, idx);
 }
 
+/*
+ * Notify external page track nodes that a memory region is being removed from
+ * the VM, e.g. so that users can free any associated metadata.
+ */
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &kvm->arch.track_notifier_head;
+
+	if (hlist_empty(&head->track_notifier_list))
+		return;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
+				srcu_read_lock_held(&head->track_srcu))
+		if (n->track_remove_region)
+			n->track_remove_region(slot->base_gfn, slot->npages, n);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
+
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b587858e878e..cb0005e4baf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12582,6 +12582,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change)
 {
+	if (change == KVM_MR_DELETE)
+		kvm_page_track_delete_slot(kvm, old);
+
 	if (!kvm->arch.n_requested_mmu_pages &&
 	    (change == KVM_MR_CREATE || change == KVM_MR_DELETE)) {
 		unsigned long nr_mmu_pages;
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (14 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 15/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 17/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Switch from the poorly named and flawed ->track_flush_slot() to the newly
introduced ->track_remove_region().  From KVMGT's perspective, the two
hooks are functionally equivalent, the only difference being that
->track_remove_region() is called only when KVM is 100% certain the
memory region will be removed, i.e. is invoked slightly later in KVM's
memslot modification flow.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
[sean: handle name change, massage changelog, rebase]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 3c59e7cd75d9..9f251bc00a7e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -108,9 +108,8 @@ struct gvt_dma {
 
 static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 				   struct kvm_page_track_notifier_node *node);
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node);
 
 static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
 {
@@ -690,7 +689,7 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 	gvt_cache_init(vgpu);
 
 	vgpu->track_node.track_write = kvmgt_page_track_write;
-	vgpu->track_node.track_flush_slot = kvmgt_page_track_flush_slot;
+	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
 	kvm_get_kvm(vgpu->vfio_device.kvm);
 	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
 					 &vgpu->track_node);
@@ -1647,20 +1646,17 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 	mutex_unlock(&info->vgpu_lock);
 }
 
-static void kvmgt_page_track_flush_slot(struct kvm *kvm,
-		struct kvm_memory_slot *slot,
-		struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
+					   struct kvm_page_track_notifier_node *node)
 {
 	unsigned long i;
-	gfn_t gfn;
 	struct intel_vgpu *info =
 		container_of(node, struct intel_vgpu, track_node);
 
 	mutex_lock(&info->gfn_lock);
-	for (i = 0; i < slot->npages; i++) {
-		gfn = slot->base_gfn + i;
-		if (kvmgt_gfn_is_write_protected(info, gfn))
-			kvmgt_protect_table_del(info, gfn);
+	for (i = 0; i < nr_pages; i++) {
+		if (kvmgt_gfn_is_write_protected(info, gfn + i))
+			kvmgt_protect_table_del(info, gfn + i);
 	}
 	mutex_unlock(&info->gfn_lock);
 }
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 17/27] KVM: x86: Remove the unused page-track hook track_flush_slot()
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (15 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 16/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 18/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

From: Yan Zhao <yan.y.zhao@intel.com>

Remove ->track_remove_slot(), there are no longer any users and it's
unlikely a "flush" hook will ever be the correct API to provide to an
external page-track user.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 11 -----------
 arch/x86/kvm/mmu/page_track.c         | 26 --------------------------
 arch/x86/kvm/x86.c                    |  2 --
 3 files changed, 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 152c5e7d7868..e5eb98ca4fce 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -33,16 +33,6 @@ struct kvm_page_track_notifier_node {
 	 */
 	void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
 			    struct kvm_page_track_notifier_node *node);
-	/*
-	 * It is called when memory slot is being moved or removed
-	 * users can drop write-protection for the pages in that memory slot
-	 *
-	 * @kvm: the kvm where memory slot being moved or removed
-	 * @slot: the memory slot being moved or removed
-	 * @node: this node
-	 */
-	void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
-			    struct kvm_page_track_notifier_node *node);
 
 	/*
 	 * Invoked when a memory region is removed from the guest.  Or in KVM
@@ -87,7 +77,6 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
 void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 			  int bytes);
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 bool kvm_page_track_has_external_user(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 959be672e2ad..d2b9f7f183cc 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -277,32 +277,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
-/*
- * Notify the node that memory slot is being removed or moved so that it can
- * drop write-protection for the pages in the memory slot.
- *
- * The node should figure out it has any write-protected pages in this slot
- * by itself.
- */
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
-	struct kvm_page_track_notifier_head *head;
-	struct kvm_page_track_notifier_node *n;
-	int idx;
-
-	head = &kvm->arch.track_notifier_head;
-
-	if (hlist_empty(&head->track_notifier_list))
-		return;
-
-	idx = srcu_read_lock(&head->track_srcu);
-	hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
-				srcu_read_lock_held(&head->track_srcu))
-		if (n->track_flush_slot)
-			n->track_flush_slot(kvm, slot, n);
-	srcu_read_unlock(&head->track_srcu, idx);
-}
-
 /*
  * Notify external page track nodes that a memory region is being removed from
  * the VM, e.g. so that users can free any associated metadata.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cb0005e4baf0..f372c41ee2c2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12610,8 +12610,6 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
 	kvm_mmu_zap_all_fast(kvm);
-
-	kvm_page_track_flush_slot(kvm, slot);
 }
 
 static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 18/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (16 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 17/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bury the declaration of the page-track helpers that are intended only for
internal KVM use in a "private" header.  In addition to guarding against
unwanted usage of the internal-only helpers, dropping their definitions
avoids exposing other structures that should be KVM-internal, e.g. for
memslots.  This is a baby step toward making kvm_host.h a KVM-internal
header in the very distant future.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 26 ++++-----------------
 arch/x86/kvm/mmu/mmu.c                |  3 ++-
 arch/x86/kvm/mmu/page_track.c         |  8 +------
 arch/x86/kvm/mmu/page_track.h         | 33 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  1 +
 5 files changed, 42 insertions(+), 29 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/page_track.h

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index e5eb98ca4fce..deece45936a5 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_KVM_PAGE_TRACK_H
 #define _ASM_X86_KVM_PAGE_TRACK_H
 
+#include <linux/kvm_types.h>
+
 enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_WRITE,
 	KVM_PAGE_TRACK_MAX,
@@ -46,28 +48,15 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
-
-bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
-enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
-					       enum pg_level max_level);
-
-void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
-int kvm_page_track_create_memslot(struct kvm *kvm,
-				  struct kvm_memory_slot *slot,
-				  unsigned long npages);
-
 void kvm_slot_page_track_add_page(struct kvm *kvm,
 				  struct kvm_memory_slot *slot, gfn_t gfn,
 				  enum kvm_page_track_mode mode);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
+					       enum pg_level max_level);
 
 void
 kvm_page_track_register_notifier(struct kvm *kvm,
@@ -75,10 +64,5 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
-void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm);
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dfeddea8148a..6477ef435575 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -24,6 +24,7 @@
 #include "kvm_cache_regs.h"
 #include "smm.h"
 #include "kvm_emulate.h"
+#include "page_track.h"
 #include "cpuid.h"
 #include "spte.h"
 
@@ -51,7 +52,7 @@
 #include <asm/io.h>
 #include <asm/set_memory.h>
 #include <asm/vmx.h>
-#include <asm/kvm_page_track.h>
+
 #include "trace.h"
 
 extern bool itlb_multihit_kvm_mitigation;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d2b9f7f183cc..2b302fd2c5dd 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -14,10 +14,9 @@
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
-#include <asm/kvm_page_track.h>
-
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "page_track.h"
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 {
@@ -317,8 +316,3 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
-
-bool kvm_page_track_has_external_user(struct kvm *kvm)
-{
-	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
-}
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
new file mode 100644
index 000000000000..89712f123ad3
--- /dev/null
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_PAGE_TRACK_H
+#define __KVM_X86_PAGE_TRACK_H
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_page_track.h>
+
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
+int kvm_page_track_create_memslot(struct kvm *kvm,
+				  struct kvm_memory_slot *slot,
+				  unsigned long npages);
+
+bool kvm_slot_page_track_is_active(struct kvm *kvm,
+				   const struct kvm_memory_slot *slot,
+				   gfn_t gfn, enum kvm_page_track_mode mode);
+
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
+void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
+{
+	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
+}
+
+#endif /* __KVM_X86_PAGE_TRACK_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f372c41ee2c2..41d47a23396c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -24,6 +24,7 @@
 #include "tss.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
+#include "mmu/page_track.h"
 #include "x86.h"
 #include "cpuid.h"
 #include "pmu.h"
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (17 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 18/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-28  6:56   ` Yan Zhao
  2023-08-07 12:01   ` Like Xu
  2022-12-23  0:57 ` [PATCH 20/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
                   ` (8 subsequent siblings)
  27 siblings, 2 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Disable the page-track notifier code at compile time if there are no
external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
now hooks emulated writes directly instead of relying on the page-track
mechanism.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu/page_track.c         |  9 ++++----
 arch/x86/kvm/mmu/page_track.h         | 30 +++++++++++++++++++++++----
 4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index eec424fac0ba..e8f8e1bd96c7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1223,7 +1223,9 @@ struct kvm_arch {
 	 * create an NX huge page (without hanging the guest).
 	 */
 	struct list_head possible_nx_huge_pages;
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 	struct kvm_page_track_notifier_head track_notifier_head;
+#endif
 	/*
 	 * Protects marking pages unsync during page faults, as TDP MMU page
 	 * faults only take mmu_lock for read.  For simplicity, the unsync
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index deece45936a5..53c2adb25a07 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -55,6 +55,7 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 				     struct kvm_memory_slot *slot, gfn_t gfn,
 				     enum kvm_page_track_mode mode);
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
 
@@ -64,5 +65,6 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2b302fd2c5dd..f932909aa9b5 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -193,6 +193,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
 }
 
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 void kvm_page_track_cleanup(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_head *head;
@@ -208,6 +209,7 @@ int kvm_page_track_init(struct kvm *kvm)
 	head = &kvm->arch.track_notifier_head;
 	INIT_HLIST_HEAD(&head->track_notifier_list);
 	return init_srcu_struct(&head->track_srcu);
+	return 0;
 }
 
 /*
@@ -254,8 +256,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
  * The node should figure out if the written page is the one that node is
  * interested in by itself.
  */
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes)
+void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			    int bytes)
 {
 	struct kvm_page_track_notifier_head *head;
 	struct kvm_page_track_notifier_node *n;
@@ -272,8 +274,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
 		if (n->track_write)
 			n->track_write(gpa, new, bytes, n);
 	srcu_read_unlock(&head->track_srcu, idx);
-
-	kvm_mmu_track_write(vcpu, gpa, new, bytes);
 }
 
 /*
@@ -316,3 +316,4 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+#endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 89712f123ad3..1b363784aa4a 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -6,8 +6,6 @@
 
 #include <asm/kvm_page_track.h>
 
-int kvm_page_track_init(struct kvm *kvm);
-void kvm_page_track_cleanup(struct kvm *kvm);
 
 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
 int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
@@ -21,13 +19,37 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
 				   gfn_t gfn, enum kvm_page_track_mode mode);
 
-void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
-			  int bytes);
+#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
+int kvm_page_track_init(struct kvm *kvm);
+void kvm_page_track_cleanup(struct kvm *kvm);
+
+void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			    int bytes);
 void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
 
 static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
 {
 	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
 }
+#else
+static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
+static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
+
+static inline void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+					  const u8 *new, int bytes) { }
+static inline void kvm_page_track_delete_slot(struct kvm *kvm,
+					      struct kvm_memory_slot *slot) { }
+
+static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
+
+#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
+
+static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+					const u8 *new, int bytes)
+{
+	__kvm_page_track_write(vcpu, gpa, new, bytes);
+
+	kvm_mmu_track_write(vcpu, gpa, new, bytes);
+}
 
 #endif /* __KVM_X86_PAGE_TRACK_H */
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 20/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (18 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 21/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective).  In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient.  And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h       |  12 +--
 arch/x86/include/asm/kvm_page_track.h |  11 +--
 arch/x86/kvm/mmu/mmu.c                |  14 ++--
 arch/x86/kvm/mmu/page_track.c         | 111 ++++++++------------------
 arch/x86/kvm/mmu/page_track.h         |   3 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c      |   4 +-
 6 files changed, 51 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e8f8e1bd96c7..f110e1bd1282 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -290,13 +290,13 @@ struct kvm_kernel_irq_routing_entry;
  * kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
  * also includes TDP pages) to determine whether or not a page can be used in
  * the given MMU context.  This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
+ * minimize the size of kvm_memory_slot.arch.gfn_write_track, i.e. allows
+ * allocating 2 bytes per gfn instead of 4 bytes per gfn.
  *
  * Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track.  As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
+ * gfn_write_track.  As above, gfn_write_track is a 16 bit counter, so KVM must
+ * not create more than 2^16-1 upper-level shadow pages at a single gfn,
+ * otherwise gfn_write_track will overflow and explosions will ensue.
  *
  * A unique shadow page (SP) for a gfn is created if and only if an existing SP
  * cannot be reused.  The ability to reuse a SP is tracked by its role, which
@@ -1018,7 +1018,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
-	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
+	unsigned short *gfn_write_track;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 53c2adb25a07..42a4ae451d36 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,11 +4,6 @@
 
 #include <linux/kvm_types.h>
 
-enum kvm_page_track_mode {
-	KVM_PAGE_TRACK_WRITE,
-	KVM_PAGE_TRACK_MAX,
-};
-
 /*
  * The notifier represented by @kvm_page_track_notifier_node is linked into
  * the head which will be notified when guest is triggering the track event.
@@ -49,11 +44,9 @@ struct kvm_page_track_notifier_node {
 };
 
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode);
+				  struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode);
+				     struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6477ef435575..ffcfc75cd4c1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -807,8 +807,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn,
-						    KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_add_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -854,8 +853,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn,
-						       KVM_PAGE_TRACK_WRITE);
+		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2727,7 +2725,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4137,7 +4135,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
@@ -5366,8 +5364,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * physical address properties) in a single VM would require tracking
 	 * all relevant CPUID information in kvm_mmu_page_role. That is very
 	 * undesirable as it would increase the memory requirements for
-	 * gfn_track (see struct kvm_mmu_page_role comments).  For now that
-	 * problem is swept under the rug; KVM's CPUID API is horrific and
+	 * gfn_write_track (see struct kvm_mmu_page_role comments).  For now
+	 * that problem is swept under the rug; KVM's CPUID API is horrific and
 	 * it's all but impossible to solve it without introducing a new API.
 	 */
 	vcpu->arch.root_mmu.root_role.word = 0;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index f932909aa9b5..4077aa6d6ff4 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -26,76 +26,50 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
-	int i;
+	kvfree(slot->arch.gfn_write_track);
+	slot->arch.gfn_write_track = NULL;
+}
 
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		kvfree(slot->arch.gfn_track[i]);
-		slot->arch.gfn_track[i] = NULL;
-	}
+static int __kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot,
+						 unsigned long npages)
+{
+	const size_t size = sizeof(*slot->arch.gfn_write_track);
+
+	if (!slot->arch.gfn_write_track)
+		slot->arch.gfn_write_track = __vcalloc(npages, size,
+						       GFP_KERNEL_ACCOUNT);
+
+	return slot->arch.gfn_write_track ? 0 : -ENOMEM;
 }
 
 int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages)
 {
-	int i;
-
-	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
-		if (i == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm))
-			continue;
-
-		slot->arch.gfn_track[i] =
-			__vcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
-				  GFP_KERNEL_ACCOUNT);
-		if (!slot->arch.gfn_track[i])
-			goto track_free;
-	}
-
-	return 0;
-
-track_free:
-	kvm_page_track_free_memslot(slot);
-	return -ENOMEM;
-}
-
-static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
-{
-	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
-		return false;
-
-	return true;
-}
-
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
-{
-	unsigned short *gfn_track;
-
-	if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE])
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return 0;
 
-	gfn_track = __vcalloc(slot->npages, sizeof(*gfn_track),
-			      GFP_KERNEL_ACCOUNT);
-	if (gfn_track == NULL)
-		return -ENOMEM;
+	return __kvm_page_track_write_tracking_alloc(slot, npages);
+}
 
-	slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track;
-	return 0;
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
+{
+	return __kvm_page_track_write_tracking_alloc(slot, slot->npages);
 }
 
-static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
-			     enum kvm_page_track_mode mode, short count)
+static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
+				   short count)
 {
 	int index, val;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
 
-	val = slot->arch.gfn_track[mode][index];
+	val = slot->arch.gfn_write_track[index];
 
 	if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
 		return;
 
-	slot->arch.gfn_track[mode][index] += count;
+	slot->arch.gfn_write_track[index] += count;
 }
 
 /*
@@ -108,21 +82,15 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn,
-				  enum kvm_page_track_mode mode)
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, 1);
+	update_gfn_write_track(slot, gfn, 1);
 
 	/*
 	 * new track stops large page mapping for the
@@ -130,9 +98,8 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
-	if (mode == KVM_PAGE_TRACK_WRITE)
-		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
-			kvm_flush_remote_tlbs(kvm);
+	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+		kvm_flush_remote_tlbs(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
 
@@ -147,20 +114,14 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
  */
 void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn,
-				     enum kvm_page_track_mode mode)
+				     struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
+	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-	if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
-		    !kvm_page_track_write_tracking_enabled(kvm)))
-		return;
-
-	update_gfn_track(slot, gfn, mode, -1);
+	update_gfn_write_track(slot, gfn, -1);
 
 	/*
 	 * allow large page mapping for the tracked page
@@ -175,22 +136,18 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
  */
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
 				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode)
+				   gfn_t gfn)
 {
 	int index;
 
-	if (WARN_ON(!page_track_mode_is_valid(mode)))
-		return false;
-
 	if (!slot)
 		return false;
 
-	if (mode == KVM_PAGE_TRACK_WRITE &&
-	    !kvm_page_track_write_tracking_enabled(kvm))
+	if (!kvm_page_track_write_tracking_enabled(kvm))
 		return false;
 
 	index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
-	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+	return !!READ_ONCE(slot->arch.gfn_write_track[index]);
 }
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 1b363784aa4a..ae2860bdf560 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -16,8 +16,7 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  unsigned long npages);
 
 bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn, enum kvm_page_track_mode mode);
+				   const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 9f251bc00a7e..cabad0ff722c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1584,7 +1584,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_add_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1618,7 +1618,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+	kvm_slot_page_track_remove_page(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 21/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (19 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 20/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 22/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 ++++----
 arch/x86/kvm/mmu/mmu.c                |  8 ++++----
 arch/x86/kvm/mmu/page_track.c         | 21 +++++++++------------
 arch/x86/kvm/mmu/page_track.h         |  4 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c      |  4 ++--
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 42a4ae451d36..20055064793a 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,10 +43,10 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_add_gfn(struct kvm *kvm,
+			     struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ffcfc75cd4c1..b4cc762cfe11 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -807,7 +807,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_add_page(kvm, slot, gfn);
+		return kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -853,7 +853,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_slot_page_track_remove_page(kvm, slot, gfn);
+		return kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
@@ -2725,7 +2725,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
 	 * track machinery is used to write-protect upper-level shadow pages,
 	 * i.e. this guards the role.level == 4K assertion below!
 	 */
-	if (kvm_slot_page_track_is_active(kvm, slot, gfn))
+	if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
 		return -EPERM;
 
 	/*
@@ -4135,7 +4135,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	 * guest is writing the page which is write tracked which can
 	 * not be fixed by page fault handler.
 	 */
-	if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
+	if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn))
 		return true;
 
 	return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 4077aa6d6ff4..1eb516119fdb 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -83,10 +83,9 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_add_page(struct kvm *kvm,
-				  struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			     gfn_t gfn)
 {
-
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -101,12 +100,11 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
 /*
  * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page. It is the opposed operation of
- * kvm_slot_page_track_add_page().
+ * of corresponding access on that page.
  *
  * It should be called under the protection both of mmu-lock and kvm->srcu
  * or kvm->slots_lock.
@@ -115,8 +113,8 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
  */
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
-				     struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_remove_gfn(struct kvm *kvm,
+				struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
@@ -129,14 +127,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
  */
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot,
-				   gfn_t gfn)
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	int index;
 
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index ae2860bdf560..b27ccc588648 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,8 +15,8 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
-				   const struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index cabad0ff722c..325afeb1246c 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1584,7 +1584,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_add_page(kvm, slot, gfn);
+	kvm_write_track_add_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -1618,7 +1618,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	}
 
 	write_lock(&kvm->mmu_lock);
-	kvm_slot_page_track_remove_page(kvm, slot, gfn);
+	kvm_write_track_remove_gfn(kvm, slot, gfn);
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 22/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (20 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 21/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 23/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

When adding/removing gfns to/from write-tracking, assert that mmu_lock
is held for write, and that either slots_lock or kvm->srcu is held.
mmu_lock must be held for write to protect gfn_write_track's refcount,
and SRCU or slots_lock must be held to protect the memslot itself.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 1eb516119fdb..209f6beba5ac 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -11,6 +11,7 @@
  *   Xiao Guangrong <guangrong.xiao@linux.intel.com>
  */
 
+#include <linux/lockdep.h>
 #include <linux/kvm_host.h>
 #include <linux/rculist.h>
 
@@ -76,9 +77,6 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
  * add guest page to the tracking pool so that corresponding access on that
  * page will be intercepted.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -86,6 +84,11 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 			     gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
@@ -106,9 +109,6 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
  * remove the guest page from the tracking pool which stops the interception
  * of corresponding access on that page.
  *
- * It should be called under the protection both of mmu-lock and kvm->srcu
- * or kvm->slots_lock.
- *
  * @kvm: the guest instance we are interested in.
  * @slot: the @gfn belongs to.
  * @gfn: the guest page.
@@ -116,6 +116,11 @@ EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 void kvm_write_track_remove_gfn(struct kvm *kvm,
 				struct kvm_memory_slot *slot, gfn_t gfn)
 {
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
+			    srcu_read_lock_held(&kvm->srcu));
+
 	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
 		return;
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 23/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (21 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 22/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 24/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Bug the VM if something attempts to write-track a gfn, but write-tracking
isn't enabled.  The VM is doomed (and KVM has an egregious bug) if KVM or
KVMGT wants to shadow guest page tables but can't because write-tracking
isn't enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/page_track.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 209f6beba5ac..d4c3bd6642b3 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -89,7 +89,7 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, 1);
@@ -121,7 +121,7 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
 			    srcu_read_lock_held(&kvm->srcu));
 
-	if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+	if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
 		return;
 
 	update_gfn_write_track(slot, gfn, -1);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 24/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (22 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 23/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 25/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  8 +--
 arch/x86/kvm/mmu/mmu.c                |  4 +-
 arch/x86/kvm/mmu/page_track.c         | 86 ++++++++++++++++++++-------
 arch/x86/kvm/mmu/page_track.h         |  5 ++
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 44 +++-----------
 5 files changed, 82 insertions(+), 65 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 20055064793a..415537ce45b4 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -43,11 +43,6 @@ struct kvm_page_track_notifier_node {
 				    struct kvm_page_track_notifier_node *node);
 };
 
-void kvm_write_track_add_gfn(struct kvm *kvm,
-			     struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-				gfn_t gfn);
-
 #ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
@@ -58,6 +53,9 @@ kvm_page_track_register_notifier(struct kvm *kvm,
 void
 kvm_page_track_unregister_notifier(struct kvm *kvm,
 				   struct kvm_page_track_notifier_node *n);
+
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
 
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b4cc762cfe11..5c1369072146 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -807,7 +807,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 
 	/* the non-leaf shadow pages are keeping readonly. */
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_add_gfn(kvm, slot, gfn);
+		return __kvm_write_track_add_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 
@@ -853,7 +853,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
 	if (sp->role.level > PG_LEVEL_4K)
-		return kvm_write_track_remove_gfn(kvm, slot, gfn);
+		return __kvm_write_track_remove_gfn(kvm, slot, gfn);
 
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d4c3bd6642b3..bc54afc1919c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -73,16 +73,8 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
 	slot->arch.gfn_write_track[index] += count;
 }
 
-/*
- * add guest page to the tracking pool so that corresponding access on that
- * page will be intercepted.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
-			     gfn_t gfn)
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -103,18 +95,9 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
 	if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
 		kvm_flush_remote_tlbs(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
 
-/*
- * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_remove_gfn(struct kvm *kvm,
-				struct kvm_memory_slot *slot, gfn_t gfn)
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn)
 {
 	lockdep_assert_held_write(&kvm->mmu_lock);
 
@@ -132,7 +115,6 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
 	 */
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
-EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 
 /*
  * check if the corresponding access on the specified guest page is tracked.
@@ -275,4 +257,64 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 	return max_level;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
+
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_add_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+
+	slot = gfn_to_memslot(kvm, gfn);
+	if (!slot) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		return -EINVAL;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	__kvm_write_track_remove_gfn(kvm, slot, gfn);
+	write_unlock(&kvm->mmu_lock);
+
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
 #endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index b27ccc588648..ee5c92083985 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,6 +15,11 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
 				  struct kvm_memory_slot *slot,
 				  unsigned long npages);
 
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+			       gfn_t gfn);
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+				  struct kvm_memory_slot *slot, gfn_t gfn);
+
 bool kvm_gfn_is_write_tracked(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 325afeb1246c..f9d21d29f533 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,9 +1563,7 @@ static struct mdev_driver intel_vgpu_mdev_driver = {
 
 int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx, ret = 0;
+	int ret = 0;
 
 	if (!info->attached)
 		return -ESRCH;
@@ -1575,21 +1573,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 	if (kvmgt_gfn_is_write_protected(info, gfn))
 		goto out;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		ret = -EINVAL;
-		goto out;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_add_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	kvmgt_protect_table_add(info, gfn);
+	ret = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+	if (!ret)
+		kvmgt_protect_table_add(info, gfn);
 out:
 	mutex_unlock(&info->gfn_lock);
 	return ret;
@@ -1597,9 +1583,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
 
 int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 {
-	struct kvm *kvm = info->vfio_device.kvm;
-	struct kvm_memory_slot *slot;
-	int idx, ret = 0;
+	int ret = 0;
 
 	if (!info->attached)
 		return 0;
@@ -1609,21 +1593,9 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
 	if (!kvmgt_gfn_is_write_protected(info, gfn))
 		goto out;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	slot = gfn_to_memslot(kvm, gfn);
-	if (!slot) {
-		srcu_read_unlock(&kvm->srcu, idx);
-		ret = -EINVAL;
-		goto out;
-	}
-
-	write_lock(&kvm->mmu_lock);
-	kvm_write_track_remove_gfn(kvm, slot, gfn);
-	write_unlock(&kvm->mmu_lock);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	kvmgt_protect_table_del(info, gfn);
-
+	ret = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+	if (!ret)
+		kvmgt_protect_table_del(info, gfn);
 out:
 	mutex_unlock(&info->gfn_lock);
 	return ret;
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 25/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (23 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 24/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  0:57 ` [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid Sean Christopherson
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so.  Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h | 10 ++++------
 arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++------
 drivers/gpu/drm/i915/gvt/kvmgt.c      | 23 ++++++++++-------------
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 415537ce45b4..66a0d7c34311 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -47,12 +47,10 @@ struct kvm_page_track_notifier_node {
 enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 					       enum pg_level max_level);
 
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n);
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n);
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n);
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n);
 
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index bc54afc1919c..1af431a41f71 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -157,17 +157,22 @@ int kvm_page_track_init(struct kvm *kvm)
  * register the notifier so that event interception for the tracked guest
  * pages can be received.
  */
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
-				 struct kvm_page_track_notifier_node *n)
+int kvm_page_track_register_notifier(struct kvm *kvm,
+				     struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
+	if (!kvm || kvm->mm != current->mm)
+		return -ESRCH;
+
+	kvm_get_kvm(kvm);
+
 	head = &kvm->arch.track_notifier_head;
 
 	write_lock(&kvm->mmu_lock);
 	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
 	write_unlock(&kvm->mmu_lock);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
 
@@ -175,9 +180,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
  * stop receiving the event interception. It is the opposed operation of
  * kvm_page_track_register_notifier().
  */
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
-				   struct kvm_page_track_notifier_node *n)
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+					struct kvm_page_track_notifier_node *n)
 {
 	struct kvm_page_track_notifier_head *head;
 
@@ -187,6 +191,8 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
 	hlist_del_rcu(&n->node);
 	write_unlock(&kvm->mmu_lock);
 	synchronize_srcu(&head->track_srcu);
+
+	kvm_put_kvm(kvm);
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index f9d21d29f533..e4227ac6ab58 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -670,30 +670,28 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
 static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
 {
 	struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
+	int ret;
 
 	if (vgpu->attached)
 		return -EEXIST;
 
-	if (!vgpu->vfio_device.kvm ||
-	    vgpu->vfio_device.kvm->mm != current->mm) {
-		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
-		return -ESRCH;
-	}
-
 	if (__kvmgt_vgpu_exist(vgpu))
 		return -EEXIST;
 
+	vgpu->track_node.track_write = kvmgt_page_track_write;
+	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
+	ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
+					       &vgpu->track_node);
+	if (ret) {
+		gvt_vgpu_err("KVM is required to use Intel vGPU\n");
+		return ret;
+	}
+
 	vgpu->attached = true;
 
 	kvmgt_protect_table_init(vgpu);
 	gvt_cache_init(vgpu);
 
-	vgpu->track_node.track_write = kvmgt_page_track_write;
-	vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
-	kvm_get_kvm(vgpu->vfio_device.kvm);
-	kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
-					 &vgpu->track_node);
-
 	debugfs_create_ulong(KVMGT_DEBUGFS_FILENAME, 0444, vgpu->debugfs,
 			     &vgpu->nr_cache_entries);
 
@@ -730,7 +728,6 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
 
 	kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
 					   &vgpu->track_node);
-	kvm_put_kvm(vgpu->vfio_device.kvm);
 
 	kvmgt_protect_table_destroy(vgpu);
 	gvt_cache_destroy(vgpu);
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (24 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 25/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-28  7:57   ` Yan Zhao
  2022-12-23  0:57 ` [PATCH 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
  2022-12-23  9:05 ` [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
  27 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Add a page-track API to query if a gfn is "valid", i.e. is backed by a
memslot and is visible to the guest.  This is one more step toward
removing KVM internal details from the page-track APIs.

Add a FIXME to call out that intel_gvt_is_valid_gfn() is broken with
respect to 2MiB (or larger) guest entries, e.g. if the starting gfn is
valid but a 2MiB page starting at the gfn covers "invalid" memory due
to running beyond the memslot.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_page_track.h |  1 +
 arch/x86/kvm/mmu/page_track.c         | 13 +++++++++++++
 drivers/gpu/drm/i915/gvt/gtt.c        | 11 ++---------
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 66a0d7c34311..99e1d6eeb0fb 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -52,6 +52,7 @@ int kvm_page_track_register_notifier(struct kvm *kvm,
 void kvm_page_track_unregister_notifier(struct kvm *kvm,
 					struct kvm_page_track_notifier_node *n);
 
+bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
 int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
 #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 1af431a41f71..9da071a514b3 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -264,6 +264,19 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
 }
 EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
 
+bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn)
+{
+	bool ret;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	ret = kvm_is_visible_gfn(kvm, gfn);
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_page_track_is_valid_gfn);
+
 /*
  * add guest page to the tracking pool so that corresponding access on that
  * page will be intercepted.
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 59ba6639e622..43c4fc23205d 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -51,18 +51,11 @@ static int preallocated_oos_pages = 8192;
 
 static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
 {
-	struct kvm *kvm = vgpu->vfio_device.kvm;
-	int idx;
-	bool ret;
-
 	if (!vgpu->attached)
 		return false;
 
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = kvm_is_visible_gfn(kvm, gfn);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return ret;
+	/* FIXME: This doesn't properly handle guest entries larger than 4K. */
+	return kvm_page_track_is_valid_gfn(vgpu->vfio_device.kvm, gfn);
 }
 
 /*
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (25 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid Sean Christopherson
@ 2022-12-23  0:57 ` Sean Christopherson
  2022-12-23  9:05 ` [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
  27 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2022-12-23  0:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Zhenyu Wang, Zhi Wang
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao, Ben Gardon

Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only on kvm_page_track.h.
KVMGT assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including
a few lines below in the same function with the same gpa, i.e. there's
no reason to use KVM's helper for this one case.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   | 3 ++-
 drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index fbfd7eafec14..4fb94b19ffde 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -34,10 +34,11 @@
 #define _GVT_H_
 
 #include <uapi/linux/pci_regs.h>
-#include <linux/kvm_host.h>
 #include <linux/vfio.h>
 #include <linux/mdev.h>
 
+#include <asm/kvm_page_track.h>
+
 #include "i915_drv.h"
 #include "intel_gvt.h"
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e4227ac6ab58..a1647177d1c8 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1607,7 +1607,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
 	mutex_lock(&info->vgpu_lock);
 	mutex_lock(&info->gfn_lock);
 
-	if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
+	if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
 		intel_vgpu_page_track_handler(info, gpa,
 						     (void *)val, len);
 
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
  2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
                   ` (26 preceding siblings ...)
  2022-12-23  0:57 ` [PATCH 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
@ 2022-12-23  9:05 ` Yan Zhao
  2023-01-04  1:01   ` Sean Christopherson
  27 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2022-12-23  9:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022 at 12:57:12AM +0000, Sean Christopherson wrote:
> Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> page-track APIs to provide a leaner and cleaner interface.  The motivation
> for this series is to (significantly) reduce the number of KVM APIs that
> KVMGT uses, with a long-term goal of making all kvm_host.h headers
> KVM-internal.  That said, I think the cleanup itself is worthwhile,
> e.g. KVMGT really shouldn't be touching kvm->mmu_lock.
> 
> Note!  The KVMGT changes are compile tested only as I don't have the
> necessary hardware (AFAIK).  Testing, and lots of it, on the KVMGT side
> of things is needed and any help on that front would be much appreciated.
hi Sean,
Thanks for the patch!
Could you also provide the commit id that this series is based on?
I applied them on top of latest master branch (6.1.0+,
8395ae05cb5a2e31d36106e8c85efa11cda849be) in repo
https://github.com/torvalds/linux.git, yet met some conflicts and I
fixed them manually. (patch 11 and patch 25).

A rough test shows that below mutex_init is missing.
But even with this fix, I still met guest hang during guest boots up.
Will look into it and have a detailed review next week.

diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index a7ac2ec00196..c274b6a05555 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -331,6 +331,7 @@ int intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
        vgpu->id = ret;
        vgpu->sched_ctl.weight = conf->weight;
        mutex_init(&vgpu->vgpu_lock);
+       mutex_init(&vgpu->gfn_lock);
        mutex_init(&vgpu->dmabuf_lock);
        INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
        INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);


Thanks
Yan

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex
  2022-12-23  0:57 ` [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex Sean Christopherson
@ 2022-12-28  5:03   ` Yan Zhao
  2023-01-03 20:43     ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2022-12-28  5:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022 at 12:57:21AM +0000, Sean Christopherson wrote:
> Add and use a new mutex, gfn_lock, to protect accesses to the hash table
> used to track which gfns are write-protected when shadowing the guest's
> GTT.  This fixes a bug where kvmgt_page_track_write(), which doesn't hold
> kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
> a use-after-free.
> 
> Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
> as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
> sleep when acquiring vgpu->cache_lock deep down the callstack:
> 
>   intel_vgpu_page_track_handler()
>   |
>   |->  page_track->handler / ppgtt_write_protection_handler()
>        |
>        |-> ppgtt_handle_guest_write_page_table_bytes()
>            |
>            |->  ppgtt_handle_guest_write_page_table()
>                 |
>                 |-> ppgtt_handle_guest_entry_removal()
>                     |
>                     |-> ppgtt_invalidate_pte()
>                         |
>                         |-> intel_gvt_dma_unmap_guest_page()
>                             |
>                             |-> mutex_lock(&vgpu->cache_lock);
> 
This gfn_lock could lead to deadlock in below sequence.

(1) kvm_write_track_add_gfn() to GFN 1
(2) kvmgt_page_track_write() for GFN 1
kvmgt_page_track_write()
|
|->mutex_lock(&info->vgpu_lock)
|->intel_vgpu_page_track_handler (as is kvmgt_gfn_is_write_protected)
   |
   |->page_track->handler() (ppgtt_write_protection_handler())
      |	
      |->ppgtt_handle_guest_write_page_table_bytes()
         |
         |->ppgtt_handle_guest_write_page_table()
	    |
	    |->ppgtt_handle_guest_entry_add() --> new_present
	       |
	       |->ppgtt_populate_spt_by_guest_entry()
	          |
		  |->intel_vgpu_enable_page_track() --> for GFN 2
		     |
		     |->intel_gvt_page_track_add()
		        |
			|->mutex_lock(&info->gfn_lock) ===>deadlock


Below fix based on this patch is to reuse vgpu_lock to protect the hash table
info->ptable.
Please check if it's good.


diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b924ed079ad4..526bd973e784 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -364,7 +364,7 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
 {
        struct kvmgt_pgfn *p, *res = NULL;

-       lockdep_assert_held(&info->gfn_lock);
+       lockdep_assert_held(&info->vgpu_lock);

        hash_for_each_possible(info->ptable, p, hnode, gfn) {
                if (gfn == p->gfn) {
@@ -388,7 +388,7 @@ static void kvmgt_protect_table_add(struct intel_vgpu *info, gfn_t gfn)
 {
        struct kvmgt_pgfn *p;

-       lockdep_assert_held(&info->gfn_lock);
+       lockdep_assert_held(&info->vgpu_lock);

        if (kvmgt_gfn_is_write_protected(info, gfn))
                return;
@@ -1572,7 +1572,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
        if (!info->attached)
                return -ESRCH;

-       mutex_lock(&info->gfn_lock);
+       lockdep_assert_held(&info->vgpu_lock);

        if (kvmgt_gfn_is_write_protected(info, gfn))
                goto out;
@@ -1581,7 +1581,6 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
        if (!ret)
                kvmgt_protect_table_add(info, gfn);
 out:
-       mutex_unlock(&info->gfn_lock);
        return ret;
 }

@@ -1592,7 +1591,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
        if (!info->attached)
                return 0;

-       mutex_lock(&info->gfn_lock);
+       lockdep_assert_held(&info->vgpu_lock);

        if (!kvmgt_gfn_is_write_protected(info, gfn))
                goto out;
@@ -1601,7 +1600,6 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
        if (!ret)
                kvmgt_protect_table_del(info, gfn);
 out:
-       mutex_unlock(&info->gfn_lock);
        return ret;
 }

@@ -1612,13 +1610,15 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
                container_of(node, struct intel_vgpu, track_node);

        mutex_lock(&info->vgpu_lock);
-       mutex_lock(&info->gfn_lock);

        if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
                intel_vgpu_page_track_handler(info, gpa,
                                                     (void *)val, len);
        }

-       mutex_unlock(&info->gfn_lock);
        mutex_unlock(&info->vgpu_lock);
 }
@@ -1629,12 +1629,11 @@ static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
        struct intel_vgpu *info =
                container_of(node, struct intel_vgpu, track_node);
 
-       mutex_lock(&info->gfn_lock);
+       lockdep_assert_held(&info->vgpu_lock);
        for (i = 0; i < nr_pages; i++) {
                if (kvmgt_gfn_is_write_protected(info, gfn + i))
                        kvmgt_protect_table_del(info, gfn + i);
        }
-       mutex_unlock(&info->gfn_lock);
 }


Thanks
Yan

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2022-12-23  0:57 ` [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
@ 2022-12-28  5:42   ` Yan Zhao
  2023-01-03 21:13     ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2022-12-28  5:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> Honor KVM's max allowed page size when determining whether or not a 2MiB
> GTT shadow page can be created for the guest.  Querying KVM's max allowed
> size is somewhat odd as there's no strict requirement that KVM's memslots
> and VFIO's mappings are configured with the same gfn=>hva mapping, but
Without vIOMMU, VFIO's mapping is configured with the same as KVM's
memslots, i.e. with the same gfn==>HVA mapping


> the check will be accurate if userspace wants to have a functional guest,
> and at the very least checking KVM's memslots guarantees that the entire
> 2MiB range has been exposed to the guest.

I think just check the entrie 2MiB GFN range are all within KVM memslot is
enough.
If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
it in IOMMU size in 2MiB size as long as the PFNs are continous and the
whole range is all exposed to guest.
Actually normal device passthrough with VFIO-PCI also maps GFNs in a
similar way, i.e. maps a guest visible range in as large size as
possible as long as the PFN is continous. 
> 
> Note, KVM may also restrict the mapping size for reasons that aren't
> relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
Will iTLB multi-hit affect DMA?
AFAIK, IOMMU mappings currently never sets exec bit (and I'm told this bit is
under discussion to be removed).


> is write-tracked (KVM's write-tracking only handles writes from vCPUs).
> However, such scenarios are unlikely to occur with a well-behaved guest,
> and at worst will result in sub-optimal performance.
> Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 ++
>  arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++++++++
>  drivers/gpu/drm/i915/gvt/gtt.c        | 10 +++++++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..3f72c7a172fc 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -51,6 +51,8 @@ void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level);
>  
>  void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
>  int kvm_page_track_create_memslot(struct kvm *kvm,
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2e09d1b6249f..69ea16c31859 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -300,3 +300,21 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  			n->track_flush_slot(kvm, slot, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level)
> +{
> +	struct kvm_memory_slot *slot;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
> +		max_level = PG_LEVEL_4K;
> +	else
> +		max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return max_level;
> +}
> +EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index d0fca53a3563..6736d7bd94ea 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1178,14 +1178,22 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	struct intel_gvt_gtt_entry *entry)
>  {
>  	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> +	unsigned long gfn = ops->get_pfn(entry);
>  	kvm_pfn_t pfn;
> +	int max_level;
>  
>  	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>  		return 0;
>  
>  	if (!vgpu->attached)
>  		return -EINVAL;
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> +
> +	max_level = kvm_page_track_max_mapping_level(vgpu->vfio_device.kvm,
> +						     gfn, PG_LEVEL_2M);
> +	if (max_level < PG_LEVEL_2M)
> +		return 0;
> +
> +	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
>  	if (is_error_noslot_pfn(pfn))
>  		return -EINVAL;
>  
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2022-12-23  0:57 ` [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
@ 2022-12-28  6:56   ` Yan Zhao
  2023-01-04  0:50     ` Sean Christopherson
  2023-08-07 12:01   ` Like Xu
  1 sibling, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2022-12-28  6:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022 at 12:57:31AM +0000, Sean Christopherson wrote:
> Disable the page-track notifier code at compile time if there are no
> external users, i.e. if CONFIG_KVM_EXTERNAL_WRITE_TRACKING=n.  KVM itself
> now hooks emulated writes directly instead of relying on the page-track
> mechanism.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  2 ++
>  arch/x86/include/asm/kvm_page_track.h |  2 ++
>  arch/x86/kvm/mmu/page_track.c         |  9 ++++----
>  arch/x86/kvm/mmu/page_track.h         | 30 +++++++++++++++++++++++----
>  4 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index eec424fac0ba..e8f8e1bd96c7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1223,7 +1223,9 @@ struct kvm_arch {
>  	 * create an NX huge page (without hanging the guest).
>  	 */
>  	struct list_head possible_nx_huge_pages;
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  	struct kvm_page_track_notifier_head track_notifier_head;
> +#endif
>  	/*
>  	 * Protects marking pages unsync during page faults, as TDP MMU page
>  	 * faults only take mmu_lock for read.  For simplicity, the unsync
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index deece45936a5..53c2adb25a07 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -55,6 +55,7 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
>  				     struct kvm_memory_slot *slot, gfn_t gfn,
>  				     enum kvm_page_track_mode mode);
>  
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  					       enum pg_level max_level);
>  
> @@ -64,5 +65,6 @@ kvm_page_track_register_notifier(struct kvm *kvm,
>  void
>  kvm_page_track_unregister_notifier(struct kvm *kvm,
>  				   struct kvm_page_track_notifier_node *n);
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
>  
>  #endif
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 2b302fd2c5dd..f932909aa9b5 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -193,6 +193,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
>  }
>  
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
>  void kvm_page_track_cleanup(struct kvm *kvm)
>  {
>  	struct kvm_page_track_notifier_head *head;
> @@ -208,6 +209,7 @@ int kvm_page_track_init(struct kvm *kvm)
>  	head = &kvm->arch.track_notifier_head;
>  	INIT_HLIST_HEAD(&head->track_notifier_list);
>  	return init_srcu_struct(&head->track_srcu);
> +	return 0;
Double "return"s.


>  }
>  
>  /*
> @@ -254,8 +256,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);
>   * The node should figure out if the written page is the one that node is
>   * interested in by itself.
>   */
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes)
> +void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			    int bytes)
>  {
>  	struct kvm_page_track_notifier_head *head;
>  	struct kvm_page_track_notifier_node *n;
> @@ -272,8 +274,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>  		if (n->track_write)
>  			n->track_write(gpa, new, bytes, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
> -
> -	kvm_mmu_track_write(vcpu, gpa, new, bytes);
>  }
>  
>  /*
> @@ -316,3 +316,4 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  	return max_level;
>  }
>  EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> +#endif
> diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
> index 89712f123ad3..1b363784aa4a 100644
> --- a/arch/x86/kvm/mmu/page_track.h
> +++ b/arch/x86/kvm/mmu/page_track.h
> @@ -6,8 +6,6 @@
>  
>  #include <asm/kvm_page_track.h>
>  
> -int kvm_page_track_init(struct kvm *kvm);
> -void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> @@ -21,13 +19,37 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
>  				   const struct kvm_memory_slot *slot,
>  				   gfn_t gfn, enum kvm_page_track_mode mode);
>  
> -void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> -			  int bytes);
> +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> +int kvm_page_track_init(struct kvm *kvm);
> +void kvm_page_track_cleanup(struct kvm *kvm);
> +
> +void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			    int bytes);
>  void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
>  
>  static inline bool kvm_page_track_has_external_user(struct kvm *kvm)
>  {
>  	return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list);
>  }
> +#else
> +static inline int kvm_page_track_init(struct kvm *kvm) { return 0; }
> +static inline void kvm_page_track_cleanup(struct kvm *kvm) { }
> +
> +static inline void __kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +					  const u8 *new, int bytes) { }
> +static inline void kvm_page_track_delete_slot(struct kvm *kvm,
> +					      struct kvm_memory_slot *slot) { }
> +
> +static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return false; }
> +
> +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
> +
> +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +					const u8 *new, int bytes)
> +{
> +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> +
Why not convert "vcpu" to "kvm" in __kvm_page_track_write() ?
i.e.
void __kvm_page_track_write(struct kvm *kvm, gpa_t gpa, const u8 *new, int bytes);


Thanks
Yan

> +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> +}
>  
>  #endif /* __KVM_X86_PAGE_TRACK_H */
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid
  2022-12-23  0:57 ` [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid Sean Christopherson
@ 2022-12-28  7:57   ` Yan Zhao
  2023-01-03 21:19     ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2022-12-28  7:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022 at 12:57:38AM +0000, Sean Christopherson wrote:
> Add a page-track API to query if a gfn is "valid", i.e. is backed by a
> memslot and is visible to the guest.  This is one more step toward
> removing KVM internal details from the page-track APIs.
> 
> Add a FIXME to call out that intel_gvt_is_valid_gfn() is broken with
> respect to 2MiB (or larger) guest entries, e.g. if the starting gfn is
> valid but a 2MiB page starting at the gfn covers "invalid" memory due
> to running beyond the memslot.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  1 +
>  arch/x86/kvm/mmu/page_track.c         | 13 +++++++++++++
>  drivers/gpu/drm/i915/gvt/gtt.c        | 11 ++---------
>  3 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 66a0d7c34311..99e1d6eeb0fb 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -52,6 +52,7 @@ int kvm_page_track_register_notifier(struct kvm *kvm,
>  void kvm_page_track_unregister_notifier(struct kvm *kvm,
>  					struct kvm_page_track_notifier_node *n);
>  
> +bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn);
>  int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
>  int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
>  #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 1af431a41f71..9da071a514b3 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -264,6 +264,19 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
>  }
>  EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
>  
> +bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn)
> +{
> +	bool ret;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	ret = kvm_is_visible_gfn(kvm, gfn);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(kvm_page_track_is_valid_gfn);
This implementation is only to check whether a GFN is within a visible
kvm memslot. So, why this helper function is named kvm_page_track_xxx()?
Don't think it's anything related to page track, and not all of its callers
in KVMGT are for page tracking.

Thanks
Yan

> +
>  /*
>   * add guest page to the tracking pool so that corresponding access on that
>   * page will be intercepted.
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index 59ba6639e622..43c4fc23205d 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -51,18 +51,11 @@ static int preallocated_oos_pages = 8192;
>  
>  static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
>  {
> -	struct kvm *kvm = vgpu->vfio_device.kvm;
> -	int idx;
> -	bool ret;
> -
>  	if (!vgpu->attached)
>  		return false;
>  
> -	idx = srcu_read_lock(&kvm->srcu);
> -	ret = kvm_is_visible_gfn(kvm, gfn);
> -	srcu_read_unlock(&kvm->srcu, idx);
> -
> -	return ret;
> +	/* FIXME: This doesn't properly handle guest entries larger than 4K. */
> +	return kvm_page_track_is_valid_gfn(vgpu->vfio_device.kvm, gfn);
>  }
>  
>  /*
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex
  2022-12-28  5:03   ` Yan Zhao
@ 2023-01-03 20:43     ` Sean Christopherson
  2023-01-05  0:51       ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-03 20:43 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Dec 28, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:21AM +0000, Sean Christopherson wrote:
> > Add and use a new mutex, gfn_lock, to protect accesses to the hash table
> > used to track which gfns are write-protected when shadowing the guest's
> > GTT.  This fixes a bug where kvmgt_page_track_write(), which doesn't hold
> > kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
> > a use-after-free.
> > 
> > Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
> > as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
> > sleep when acquiring vgpu->cache_lock deep down the callstack:
> > 
> >   intel_vgpu_page_track_handler()
> >   |
> >   |->  page_track->handler / ppgtt_write_protection_handler()
> >        |
> >        |-> ppgtt_handle_guest_write_page_table_bytes()
> >            |
> >            |->  ppgtt_handle_guest_write_page_table()
> >                 |
> >                 |-> ppgtt_handle_guest_entry_removal()
> >                     |
> >                     |-> ppgtt_invalidate_pte()
> >                         |
> >                         |-> intel_gvt_dma_unmap_guest_page()
> >                             |
> >                             |-> mutex_lock(&vgpu->cache_lock);
> > 
> This gfn_lock could lead to deadlock in below sequence.
> 
> (1) kvm_write_track_add_gfn() to GFN 1
> (2) kvmgt_page_track_write() for GFN 1
> kvmgt_page_track_write()
> |
> |->mutex_lock(&info->vgpu_lock)
> |->intel_vgpu_page_track_handler (as is kvmgt_gfn_is_write_protected)
>    |
>    |->page_track->handler() (ppgtt_write_protection_handler())
>       |	
>       |->ppgtt_handle_guest_write_page_table_bytes()
>          |
>          |->ppgtt_handle_guest_write_page_table()
> 	    |
> 	    |->ppgtt_handle_guest_entry_add() --> new_present
> 	       |
> 	       |->ppgtt_populate_spt_by_guest_entry()
> 	          |
> 		  |->intel_vgpu_enable_page_track() --> for GFN 2
> 		     |
> 		     |->intel_gvt_page_track_add()
> 		        |
> 			|->mutex_lock(&info->gfn_lock) ===>deadlock

Or even more simply, 

  kvmgt_page_track_write()
  |
  -> intel_vgpu_page_track_handler()
     |
     -> intel_gvt_page_track_remove()

> 
> Below fix based on this patch is to reuse vgpu_lock to protect the hash table
> info->ptable.
> Please check if it's good.
> 
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index b924ed079ad4..526bd973e784 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -364,7 +364,7 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
>  {
>         struct kvmgt_pgfn *p, *res = NULL;
> 
> -       lockdep_assert_held(&info->gfn_lock);
> +       lockdep_assert_held(&info->vgpu_lock);
> 
>         hash_for_each_possible(info->ptable, p, hnode, gfn) {
>                 if (gfn == p->gfn) {
> @@ -388,7 +388,7 @@ static void kvmgt_protect_table_add(struct intel_vgpu *info, gfn_t gfn)
>  {
>         struct kvmgt_pgfn *p;
> 
> -       lockdep_assert_held(&info->gfn_lock);
> +       lockdep_assert_held(&info->vgpu_lock);

I'll just delete these assertions, the one in __kvmgt_protect_table_find() should
cover everything and is ultimately the assert that matters.

> @@ -1629,12 +1629,11 @@ static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
>         struct intel_vgpu *info =
>                 container_of(node, struct intel_vgpu, track_node);
>  
> -       mutex_lock(&info->gfn_lock);
> +       lockdep_assert_held(&info->vgpu_lock);

This path needs to manually take vgpu_lock as it's called from KVM.  IIRC, this
is the main reason I tried adding a new lock.  That and I had a hell of a time
figuring out whether or not vgpu_lock would actually be held.

Looking at this with fresh eyes, AFAICT intel_vgpu_reset_gtt() is the only other
path that can reach __kvmgt_protect_table_find() without holding vgpu_lock, by
way of intel_gvt_page_track_remove().  But unless there's magic I'm missing, that's
dead code and can simply be deleted.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2022-12-28  5:42   ` Yan Zhao
@ 2023-01-03 21:13     ` Sean Christopherson
  2023-01-05  3:07       ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-03 21:13 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Dec 28, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > size is somewhat odd as there's no strict requirement that KVM's memslots
> > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> memslots, i.e. with the same gfn==>HVA mapping

But that's controlled by userspace, correct?

> > the check will be accurate if userspace wants to have a functional guest,
> > and at the very least checking KVM's memslots guarantees that the entire
> > 2MiB range has been exposed to the guest.
> 
> I think just check the entrie 2MiB GFN range are all within KVM memslot is
> enough.

Strictly speaking, no.  E.g. if a 2MiB region is covered with multiple memslots
and the memslots have different properties.

> If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> whole range is all exposed to guest.

I agree that practically speaking this will hold true, but if KVMGT wants to honor
KVM's memslots then checking that KVM allows a hugepage is correct.  Hrm, but on
the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
pieces of KVM's memslots.

I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
2MiB range is exposed to the guest.

That said, being fully permissive also seems wasteful, e.g. KVM would need to
explicitly support straddling multiple memslots.

As a middle ground, what about tweaking kvm_page_track_is_valid_gfn() to take a
range, and then checking that the range is contained in a single memslot?

E.g. something like:

bool kvm_page_track_is_contiguous_gfn_range(struct kvm *kvm, gfn_t gfn,
					    unsigned long nr_pages)
{
	struct kvm_memory_slot *memslot;
	bool ret;
	int idx;

	idx = srcu_read_lock(&kvm->srcu);
	memslot = gfn_to_memslot(kvm, gfn);
	ret = kvm_is_visible_memslot(memslot) &&
	      gfn + nr_pages <= memslot->base_gfn + memslot->npages;
	srcu_read_unlock(&kvm->srcu, idx);

	return ret;
}

> Actually normal device passthrough with VFIO-PCI also maps GFNs in a
> similar way, i.e. maps a guest visible range in as large size as
> possible as long as the PFN is continous. 
> > 
> > Note, KVM may also restrict the mapping size for reasons that aren't
> > relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> Will iTLB multi-hit affect DMA?

I highly doubt it, I can't imagine an IOMMU would have a dedicated instruction
TLB :-)

> AFAIK, IOMMU mappings currently never sets exec bit (and I'm told this bit is
> under discussion to be removed).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid
  2022-12-28  7:57   ` Yan Zhao
@ 2023-01-03 21:19     ` Sean Christopherson
  2023-01-05  3:12       ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-03 21:19 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Dec 28, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:38AM +0000, Sean Christopherson wrote:
> > +bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn)
> > +{
> > +	bool ret;
> > +	int idx;
> > +
> > +	idx = srcu_read_lock(&kvm->srcu);
> > +	ret = kvm_is_visible_gfn(kvm, gfn);
> > +	srcu_read_unlock(&kvm->srcu, idx);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_page_track_is_valid_gfn);
> This implementation is only to check whether a GFN is within a visible
> kvm memslot. So, why this helper function is named kvm_page_track_xxx()?
> Don't think it's anything related to page track, and not all of its callers
> in KVMGT are for page tracking.

KVMGT is the only user of kvm_page_track_is_valid_gfn().  kvm_is_visible_gfn()
has other users, just not in x86.  And long term, my goal is to allow building
KVM x86 without any exports.  Killing off KVM's "internal" (for vendor modules)
exports for select Kconfigs is easy enough, add adding a dedicated page-track API
solves the KVMGT angle.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2022-12-28  6:56   ` Yan Zhao
@ 2023-01-04  0:50     ` Sean Christopherson
  0 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2023-01-04  0:50 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Dec 28, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:31AM +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> > index 2b302fd2c5dd..f932909aa9b5 100644
> > --- a/arch/x86/kvm/mmu/page_track.c
> > +++ b/arch/x86/kvm/mmu/page_track.c
> > @@ -193,6 +193,7 @@ bool kvm_slot_page_track_is_active(struct kvm *kvm,
> >  	return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
> >  }
> >  
> > +#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
> >  void kvm_page_track_cleanup(struct kvm *kvm)
> >  {
> >  	struct kvm_page_track_notifier_head *head;
> > @@ -208,6 +209,7 @@ int kvm_page_track_init(struct kvm *kvm)
> >  	head = &kvm->arch.track_notifier_head;
> >  	INIT_HLIST_HEAD(&head->track_notifier_list);
> >  	return init_srcu_struct(&head->track_srcu);
> > +	return 0;
> Double "return"s.

Huh, I'm surprised this didn't throw a warning.  I'm pretty sure I screwed up a
refactoring, I originally had the "return 0" in an #else branch.

> > +#endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */
> > +
> > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > +					const u8 *new, int bytes)
> > +{
> > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > +
> Why not convert "vcpu" to "kvm" in __kvm_page_track_write() ?

No reason, I just overlooked the opportunistic cleanup.  I'll do this in the next
version.

Thanks much for the reviews!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
  2022-12-23  9:05 ` [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
@ 2023-01-04  1:01   ` Sean Christopherson
  2023-01-05  3:13     ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-04  1:01 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Dec 23, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:12AM +0000, Sean Christopherson wrote:
> > Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> > page-track APIs to provide a leaner and cleaner interface.  The motivation
> > for this series is to (significantly) reduce the number of KVM APIs that
> > KVMGT uses, with a long-term goal of making all kvm_host.h headers
> > KVM-internal.  That said, I think the cleanup itself is worthwhile,
> > e.g. KVMGT really shouldn't be touching kvm->mmu_lock.
> > 
> > Note!  The KVMGT changes are compile tested only as I don't have the
> > necessary hardware (AFAIK).  Testing, and lots of it, on the KVMGT side
> > of things is needed and any help on that front would be much appreciated.
> hi Sean,
> Thanks for the patch!
> Could you also provide the commit id that this series is based on?

The commit ID is provided in the cover letter:

  base-commit: 9d75a3251adfbcf444681474511b58042a364863

Though you might have a hard time finding that commit as it's from an old
version of kvm/queue that's probably since been force pushed.

> I applied them on top of latest master branch (6.1.0+,
> 8395ae05cb5a2e31d36106e8c85efa11cda849be) in repo
> https://github.com/torvalds/linux.git, yet met some conflicts and I
> fixed them manually. (patch 11 and patch 25).
> 
> A rough test shows that below mutex_init is missing.
> But even with this fix, I still met guest hang during guest boots up.
> Will look into it and have a detailed review next week.

Thanks again for the reviews and testing!  I'll get a v2 out in the next week or
so (catching up from holidays) and will be more explicit in documenting the base
version.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex
  2023-01-03 20:43     ` Sean Christopherson
@ 2023-01-05  0:51       ` Yan Zhao
  0 siblings, 0 replies; 61+ messages in thread
From: Yan Zhao @ 2023-01-05  0:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Tue, Jan 03, 2023 at 08:43:17PM +0000, Sean Christopherson wrote:
> On Wed, Dec 28, 2022, Yan Zhao wrote:
> > On Fri, Dec 23, 2022 at 12:57:21AM +0000, Sean Christopherson wrote:
> > > Add and use a new mutex, gfn_lock, to protect accesses to the hash table
> > > used to track which gfns are write-protected when shadowing the guest's
> > > GTT.  This fixes a bug where kvmgt_page_track_write(), which doesn't hold
> > > kvm->mmu_lock, could race with intel_gvt_page_track_remove() and trigger
> > > a use-after-free.
> > > 
> > > Fixing kvmgt_page_track_write() by taking kvm->mmu_lock is not an option
> > > as mmu_lock is a r/w spinlock, and intel_vgpu_page_track_handler() might
> > > sleep when acquiring vgpu->cache_lock deep down the callstack:
> > > 
> > >   intel_vgpu_page_track_handler()
> > >   |
> > >   |->  page_track->handler / ppgtt_write_protection_handler()
> > >        |
> > >        |-> ppgtt_handle_guest_write_page_table_bytes()
> > >            |
> > >            |->  ppgtt_handle_guest_write_page_table()
> > >                 |
> > >                 |-> ppgtt_handle_guest_entry_removal()
> > >                     |
> > >                     |-> ppgtt_invalidate_pte()
> > >                         |
> > >                         |-> intel_gvt_dma_unmap_guest_page()
> > >                             |
> > >                             |-> mutex_lock(&vgpu->cache_lock);
> > > 
> > This gfn_lock could lead to deadlock in below sequence.
> > 
> > (1) kvm_write_track_add_gfn() to GFN 1
> > (2) kvmgt_page_track_write() for GFN 1
> > kvmgt_page_track_write()
> > |
> > |->mutex_lock(&info->vgpu_lock)
> > |->intel_vgpu_page_track_handler (as is kvmgt_gfn_is_write_protected)
> >    |
> >    |->page_track->handler() (ppgtt_write_protection_handler())
> >       |	
> >       |->ppgtt_handle_guest_write_page_table_bytes()
> >          |
> >          |->ppgtt_handle_guest_write_page_table()
> > 	    |
> > 	    |->ppgtt_handle_guest_entry_add() --> new_present
> > 	       |
> > 	       |->ppgtt_populate_spt_by_guest_entry()
> > 	          |
> > 		  |->intel_vgpu_enable_page_track() --> for GFN 2
> > 		     |
> > 		     |->intel_gvt_page_track_add()
> > 		        |
> > 			|->mutex_lock(&info->gfn_lock) ===>deadlock
> 
> Or even more simply, 
> 
>   kvmgt_page_track_write()
>   |
>   -> intel_vgpu_page_track_handler()
>      |
>      -> intel_gvt_page_track_remove()
>
yes.

> > 
> > Below fix based on this patch is to reuse vgpu_lock to protect the hash table
> > info->ptable.
> > Please check if it's good.
> > 
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index b924ed079ad4..526bd973e784 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -364,7 +364,7 @@ __kvmgt_protect_table_find(struct intel_vgpu *info, gfn_t gfn)
> >  {
> >         struct kvmgt_pgfn *p, *res = NULL;
> > 
> > -       lockdep_assert_held(&info->gfn_lock);
> > +       lockdep_assert_held(&info->vgpu_lock);
> > 
> >         hash_for_each_possible(info->ptable, p, hnode, gfn) {
> >                 if (gfn == p->gfn) {
> > @@ -388,7 +388,7 @@ static void kvmgt_protect_table_add(struct intel_vgpu *info, gfn_t gfn)
> >  {
> >         struct kvmgt_pgfn *p;
> > 
> > -       lockdep_assert_held(&info->gfn_lock);
> > +       lockdep_assert_held(&info->vgpu_lock);
> 
> I'll just delete these assertions, the one in __kvmgt_protect_table_find() should
> cover everything and is ultimately the assert that matters.
> 
> > @@ -1629,12 +1629,11 @@ static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
> >         struct intel_vgpu *info =
> >                 container_of(node, struct intel_vgpu, track_node);
> >  
> > -       mutex_lock(&info->gfn_lock);
> > +       lockdep_assert_held(&info->vgpu_lock);
> 
> This path needs to manually take vgpu_lock as it's called from KVM.  IIRC, this
> is the main reason I tried adding a new lock.  That and I had a hell of a time
> figuring out whether or not vgpu_lock would actually be held.
Right. In the path of kvmgt_page_track_remove_region(),
mutex_lock(&info->vgpu_lock) and  mutex_unlock(&info->vgpu_lock) are
required.

static void kvmgt_page_track_remove_region(gfn_t gfn, unsigned long nr_pages,
                                           struct kvm_page_track_notifier_node *node)
{
        unsigned long i;
        struct intel_vgpu *info =
                container_of(node, struct intel_vgpu, track_node);

        mutex_lock(&info->vgpu_lock);
        for (i = 0; i < nr_pages; i++) {
                if (kvmgt_gfn_is_write_protected(info, gfn + i))
                        kvmgt_protect_table_del(info, gfn + i);
        }
        mutex_unlock(&info->vgpu_lock);
}

The reason I previously could have lockdep_assert_held(&info->vgpu_lock) passed
is that I didn't get LOCKDEP configured, so it's basically a void.
(sorry, though I actually also called mutex_is_locked(&info->vcpu_lock)
in some paths to check lockdep_assert_held() worked properly. But it's my
fault not to double check it's compiled correctly).


> 
> Looking at this with fresh eyes, AFAICT intel_vgpu_reset_gtt() is the only other
> path that can reach __kvmgt_protect_table_find() without holding vgpu_lock, by
> way of intel_gvt_page_track_remove().  But unless there's magic I'm missing, that's
> dead code and can simply be deleted.
Yes, I found intel_vgpu_reset_gtt() has not been called since
ba25d977571e1551b7032d6104e49efd6f88f8ad.




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-03 21:13     ` Sean Christopherson
@ 2023-01-05  3:07       ` Yan Zhao
  2023-01-05 17:40         ` Sean Christopherson
  2023-01-12  8:31         ` Yan Zhao
  0 siblings, 2 replies; 61+ messages in thread
From: Yan Zhao @ 2023-01-05  3:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Tue, Jan 03, 2023 at 09:13:54PM +0000, Sean Christopherson wrote:
> On Wed, Dec 28, 2022, Yan Zhao wrote:
> > On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > > size is somewhat odd as there's no strict requirement that KVM's memslots
> > > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> > Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> > memslots, i.e. with the same gfn==>HVA mapping
> 
> But that's controlled by userspace, correct?

Yes, controlled by QEMU.
VFIO in kernel has no idea of whether vIOMMU is enabled or not.
KVMGT currently is known not working with vIOMMU with shadow mode on
(in this mode, VFIO maps gIOVA ==> HVA ==> HPA) .

> 
> > > the check will be accurate if userspace wants to have a functional guest,
> > > and at the very least checking KVM's memslots guarantees that the entire
> > > 2MiB range has been exposed to the guest.
> > 
> > I think just check the entrie 2MiB GFN range are all within KVM memslot is
> > enough.
> 
> Strictly speaking, no.  E.g. if a 2MiB region is covered with multiple memslots
> and the memslots have different properties.
> 
> > If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> > it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> > whole range is all exposed to guest.
> 
> I agree that practically speaking this will hold true, but if KVMGT wants to honor
> KVM's memslots then checking that KVM allows a hugepage is correct.  Hrm, but on
> the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
> pieces of KVM's memslots.
KVMGT calls dma_map_page() with DMA_BIDIRECTIONAL after checking gvt_pin_guest_page().
Though for a read-only memslot, DMA_TO_DEVICE should be used instead
(see dma_info_to_prot()),
as gvt_pin_guest_page() checks (IOMMU_READ | IOMMU_WRITE) permission for each page,
it actually ensures that the pinned GFN is not in a read-only memslot.
So, it should be fine.

> 
> I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
> 2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
> 2MiB range is exposed to the guest.
> 
sorry. I may not put it clearly enough.
for a normal device pass-through via VFIO-PCI, VFIO maps IOMMU mappings in this way:

(a) fault in PFNs in a GFN range within the same memslot (VFIO saves dma_list, which is
the same as memslot list when vIOMMU is not on or not in shadow mode).
(b) map continuous PFNs into iommu driver (honour ro attribute and can > 2MiB as long as
PFNs are continuous).
(c) IOMMU driver decides to map in 2MiB or in 4KiB according to its setting.

For KVMGT, gvt_dma_map_page() first calls gvt_pin_guest_page() which
(a) calls vfio_pin_pages() to check each GFN is within allowed dma_list with
(IOMMU_READ | IOMMU_WRITE) permission and fault-in page. 
(b) checks PFNs are continuous in 2MiB,

Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
mapping size unnecessarily smaller.

> That said, being fully permissive also seems wasteful, e.g. KVM would need to
> explicitly support straddling multiple memslots.
> 
> As a middle ground, what about tweaking kvm_page_track_is_valid_gfn() to take a
> range, and then checking that the range is contained in a single memslot?
> 
> E.g. something like:
> 
> bool kvm_page_track_is_contiguous_gfn_range(struct kvm *kvm, gfn_t gfn,
> 					    unsigned long nr_pages)
> {
> 	struct kvm_memory_slot *memslot;
> 	bool ret;
> 	int idx;
> 
> 	idx = srcu_read_lock(&kvm->srcu);
> 	memslot = gfn_to_memslot(kvm, gfn);
> 	ret = kvm_is_visible_memslot(memslot) &&
> 	      gfn + nr_pages <= memslot->base_gfn + memslot->npages;
> 	srcu_read_unlock(&kvm->srcu, idx);
> 
> 	return ret;
> }

Yes, it's good.
But as explained above, gvt_dma_map_page() checks in an equivalent way.
Maybe checking kvm_page_track_is_contiguous_gfn_range() is also not
required?
> 
> > Actually normal device passthrough with VFIO-PCI also maps GFNs in a
> > similar way, i.e. maps a guest visible range in as large size as
> > possible as long as the PFN is continous. 
> > > 
> > > Note, KVM may also restrict the mapping size for reasons that aren't
> > > relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> > Will iTLB multi-hit affect DMA?
> 
> I highly doubt it, I can't imagine an IOMMU would have a dedicated instruction
> TLB :-)
I can double check it with IOMMU hardware experts.
But if DMA would tamper instruction TLB, it should have been reported
as an issue with normal VFIO pass-through?

> > AFAIK, IOMMU mappings currently never sets exec bit (and I'm told this bit is
> > under discussion to be removed).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid
  2023-01-03 21:19     ` Sean Christopherson
@ 2023-01-05  3:12       ` Yan Zhao
  2023-01-05 17:53         ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-01-05  3:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Tue, Jan 03, 2023 at 09:19:01PM +0000, Sean Christopherson wrote:
> On Wed, Dec 28, 2022, Yan Zhao wrote:
> > On Fri, Dec 23, 2022 at 12:57:38AM +0000, Sean Christopherson wrote:
> > > +bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn)
> > > +{
> > > +	bool ret;
> > > +	int idx;
> > > +
> > > +	idx = srcu_read_lock(&kvm->srcu);
> > > +	ret = kvm_is_visible_gfn(kvm, gfn);
> > > +	srcu_read_unlock(&kvm->srcu, idx);
> > > +
> > > +	return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(kvm_page_track_is_valid_gfn);
> > This implementation is only to check whether a GFN is within a visible
> > kvm memslot. So, why this helper function is named kvm_page_track_xxx()?
> > Don't think it's anything related to page track, and not all of its callers
> > in KVMGT are for page tracking.
> 
> KVMGT is the only user of kvm_page_track_is_valid_gfn().  kvm_is_visible_gfn()
> has other users, just not in x86.  And long term, my goal is to allow building
> KVM x86 without any exports.  Killing off KVM's "internal" (for vendor modules)
> exports for select Kconfigs is easy enough, add adding a dedicated page-track API
> solves the KVMGT angle.
Understand!
But personally, I don't like merging this API into page-track API as
it obviously has nothing to do with page-track stuffs, and KVMGT also calls it for
non-page-track purpuse.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups
  2023-01-04  1:01   ` Sean Christopherson
@ 2023-01-05  3:13     ` Yan Zhao
  0 siblings, 0 replies; 61+ messages in thread
From: Yan Zhao @ 2023-01-05  3:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Wed, Jan 04, 2023 at 01:01:13AM +0000, Sean Christopherson wrote:
> On Fri, Dec 23, 2022, Yan Zhao wrote:
> > On Fri, Dec 23, 2022 at 12:57:12AM +0000, Sean Christopherson wrote:
> > > Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> > > page-track APIs to provide a leaner and cleaner interface.  The motivation
> > > for this series is to (significantly) reduce the number of KVM APIs that
> > > KVMGT uses, with a long-term goal of making all kvm_host.h headers
> > > KVM-internal.  That said, I think the cleanup itself is worthwhile,
> > > e.g. KVMGT really shouldn't be touching kvm->mmu_lock.
> > > 
> > > Note!  The KVMGT changes are compile tested only as I don't have the
> > > necessary hardware (AFAIK).  Testing, and lots of it, on the KVMGT side
> > > of things is needed and any help on that front would be much appreciated.
> > hi Sean,
> > Thanks for the patch!
> > Could you also provide the commit id that this series is based on?
> 
> The commit ID is provided in the cover letter:
> 
>   base-commit: 9d75a3251adfbcf444681474511b58042a364863
> 
> Though you might have a hard time finding that commit as it's from an old
> version of kvm/queue that's probably since been force pushed.
> 
> > I applied them on top of latest master branch (6.1.0+,
> > 8395ae05cb5a2e31d36106e8c85efa11cda849be) in repo
> > https://github.com/torvalds/linux.git, yet met some conflicts and I
> > fixed them manually. (patch 11 and patch 25).
> > 
> > A rough test shows that below mutex_init is missing.
> > But even with this fix, I still met guest hang during guest boots up.
> > Will look into it and have a detailed review next week.
> 
> Thanks again for the reviews and testing!  I'll get a v2 out in the next week or
> so (catching up from holidays) and will be more explicit in documenting the base
> version.
That's fine and it's a pleasure to me :)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-05  3:07       ` Yan Zhao
@ 2023-01-05 17:40         ` Sean Christopherson
  2023-01-06  5:56           ` Yan Zhao
  2023-01-12  8:31         ` Yan Zhao
  1 sibling, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-05 17:40 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Thu, Jan 05, 2023, Yan Zhao wrote:
> On Tue, Jan 03, 2023 at 09:13:54PM +0000, Sean Christopherson wrote:
> > On Wed, Dec 28, 2022, Yan Zhao wrote:
> > > On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > > > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > > > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > > > size is somewhat odd as there's no strict requirement that KVM's memslots
> > > > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> > > Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> > > memslots, i.e. with the same gfn==>HVA mapping
> > 
> > But that's controlled by userspace, correct?
> 
> Yes, controlled by QEMU.

...

> > Strictly speaking, no.  E.g. if a 2MiB region is covered with multiple memslots
> > and the memslots have different properties.
> > 
> > > If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> > > it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> > > whole range is all exposed to guest.
> > 
> > I agree that practically speaking this will hold true, but if KVMGT wants to honor
> > KVM's memslots then checking that KVM allows a hugepage is correct.  Hrm, but on
> > the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
> > pieces of KVM's memslots.
> KVMGT calls dma_map_page() with DMA_BIDIRECTIONAL after checking gvt_pin_guest_page().
> Though for a read-only memslot, DMA_TO_DEVICE should be used instead
> (see dma_info_to_prot()),
> as gvt_pin_guest_page() checks (IOMMU_READ | IOMMU_WRITE) permission for each page,
> it actually ensures that the pinned GFN is not in a read-only memslot.
> So, it should be fine.
> 
> > 
> > I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
> > 2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
> > 2MiB range is exposed to the guest.
> > 
> sorry. I may not put it clearly enough.
> for a normal device pass-through via VFIO-PCI, VFIO maps IOMMU mappings in this way:
> 
> (a) fault in PFNs in a GFN range within the same memslot (VFIO saves dma_list, which is
> the same as memslot list when vIOMMU is not on or not in shadow mode).
> (b) map continuous PFNs into iommu driver (honour ro attribute and can > 2MiB as long as
> PFNs are continuous).
> (c) IOMMU driver decides to map in 2MiB or in 4KiB according to its setting.
> 
> For KVMGT, gvt_dma_map_page() first calls gvt_pin_guest_page() which
> (a) calls vfio_pin_pages() to check each GFN is within allowed dma_list with
> (IOMMU_READ | IOMMU_WRITE) permission and fault-in page. 
> (b) checks PFNs are continuous in 2MiB,
> 
> Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
> mapping size unnecessarily smaller.

Yeah, I got all that.  What I'm trying to say, and why I asked about whether or
not userspace controls the mappings, is that AFAIK there is nothing in the kernel
that coordinates mappings between VFIO and KVM.  So, very technically, userspace
could map a 2MiB range contiguous in VFIO but not in KVM, or RW in VFIO but RO in KVM.

I can't imagine there's a real use case for doing so, and arguably there's no
requirement that KVMGT honor KVM's memslot.  But because KVMGT taps into KVM's
page-tracking, KVMGT _does_ honor KVM's memslots to some extent because KVMGT
needs to know whether or not a given GFN can be write-protected.

I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
and permissions, and that the only requirement for KVM memslots is that GTT page
tables need to be visible in KVM's memslots.  But if that's the ABI, then
intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

In other words, pick either VFIO or KVM.  Checking that X is valid according to
KVM and then mapping X through VFIO is confusing and makes assumptions about how
userspace configures KVM and VFIO.  It works because QEMU always configures KVM
and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
unaware readers because the code is technically flawed.

On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
gfn.  If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
to handle the case where the guest creates a bogus mapping when writing an existing
GTT PT.

Combing all my trains of thought, what about this as an end state for this series?
(completely untested at this point).  Get rid of the KVM mapping size checks,
verify the validity of the entire range being mapped, and add a FIXME to complain
about using KVM instead of VFIO to determine the validity of ranges.

static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn,
				   enum intel_gvt_gtt_type type)
{
	unsigned long nr_pages;

	if (!vgpu->attached)
		return false;

	if (type == GTT_TYPE_PPGTT_PTE_64K_ENTRY)
		nr_pages = I915_GTT_PAGE_SIZE_64K >> PAGE_SHIFT;
	else if (type == GTT_TYPE_PPGTT_PTE_2M_ENTRY)
		nr_pages = I915_GTT_PAGE_SIZE_2M >> PAGE_SHIFT;
	else
		nr_pages = 1;

	/*
	 * FIXME: Probe VFIO, not KVM.  VFIO is the source of truth for KVMGT
	 * mappings and permissions, KVM's involvement is purely to handle
	 * write-tracking of GTT page tables.
	 */
	return kvm_page_track_is_contiguous_gfn_range(vgpu->vfio_device.kvm,
						      gfn, nr_pages);
}

static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu, unsigned long gfn,
				 dma_addr_t *dma_addr)
{
	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
		return 0;

	return intel_gvt_dma_map_guest_page(vgpu, gfn,
					    I915_GTT_PAGE_SIZE_2M, dma_addr);
}

static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
	struct intel_gvt_gtt_entry *ge)
{
	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
	dma_addr_t dma_addr = vgpu->gvt->gtt.scratch_mfn << PAGE_SHIFT;
	struct intel_gvt_gtt_entry se = *ge;
	unsigned long gfn;
	int ret;

	if (!pte_ops->test_present(ge))
		goto set_shadow_entry;

	gfn = pte_ops->get_pfn(ge);
	if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
		goto set_shadow_entry;

	...


set_shadow_entry:
	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
	ppgtt_set_shadow_entry(spt, &se, index);
	return 0;
}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid
  2023-01-05  3:12       ` Yan Zhao
@ 2023-01-05 17:53         ` Sean Christopherson
  0 siblings, 0 replies; 61+ messages in thread
From: Sean Christopherson @ 2023-01-05 17:53 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Thu, Jan 05, 2023, Yan Zhao wrote:
> On Tue, Jan 03, 2023 at 09:19:01PM +0000, Sean Christopherson wrote:
> > On Wed, Dec 28, 2022, Yan Zhao wrote:
> > > On Fri, Dec 23, 2022 at 12:57:38AM +0000, Sean Christopherson wrote:
> > > > +bool kvm_page_track_is_valid_gfn(struct kvm *kvm, gfn_t gfn)
> > > > +{
> > > > +	bool ret;
> > > > +	int idx;
> > > > +
> > > > +	idx = srcu_read_lock(&kvm->srcu);
> > > > +	ret = kvm_is_visible_gfn(kvm, gfn);
> > > > +	srcu_read_unlock(&kvm->srcu, idx);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(kvm_page_track_is_valid_gfn);
> > > This implementation is only to check whether a GFN is within a visible
> > > kvm memslot. So, why this helper function is named kvm_page_track_xxx()?
> > > Don't think it's anything related to page track, and not all of its callers
> > > in KVMGT are for page tracking.
> > 
> > KVMGT is the only user of kvm_page_track_is_valid_gfn().  kvm_is_visible_gfn()
> > has other users, just not in x86.  And long term, my goal is to allow building
> > KVM x86 without any exports.  Killing off KVM's "internal" (for vendor modules)
> > exports for select Kconfigs is easy enough, add adding a dedicated page-track API
> > solves the KVMGT angle.
> Understand!
> But personally, I don't like merging this API into page-track API as
> it obviously has nothing to do with page-track stuffs, and KVMGT also calls it for
> non-page-track purpuse.

100% agreed, but as discussed in the other patch[*], IMO the real issue is that
KVMGT is abusing KVM APIs to check the validity of GFNs that are ultimately mapped
via VFIO.  Once that issue is fixed, kvm_page_track_is_valid_gfn() can go away
entirely.  I view this as a short/medium term hack-a-fix to limit and encapsulate
KVM's API surface that is "needed" by KVMGT.

[*] https://lore.kernel.org/all/Y7cLkLUMCy+XLRwm@google.com

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-05 17:40         ` Sean Christopherson
@ 2023-01-06  5:56           ` Yan Zhao
  2023-01-06 23:01             ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-01-06  5:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> On Thu, Jan 05, 2023, Yan Zhao wrote:
> > On Tue, Jan 03, 2023 at 09:13:54PM +0000, Sean Christopherson wrote:
> > > On Wed, Dec 28, 2022, Yan Zhao wrote:
> > > > On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > > > > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > > > > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > > > > size is somewhat odd as there's no strict requirement that KVM's memslots
> > > > > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> > > > Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> > > > memslots, i.e. with the same gfn==>HVA mapping
> > > 
> > > But that's controlled by userspace, correct?
> > 
> > Yes, controlled by QEMU.
> 
> ...
> 
> > > Strictly speaking, no.  E.g. if a 2MiB region is covered with multiple memslots
> > > and the memslots have different properties.
> > > 
> > > > If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> > > > it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> > > > whole range is all exposed to guest.
> > > 
> > > I agree that practically speaking this will hold true, but if KVMGT wants to honor
> > > KVM's memslots then checking that KVM allows a hugepage is correct.  Hrm, but on
> > > the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
> > > pieces of KVM's memslots.
> > KVMGT calls dma_map_page() with DMA_BIDIRECTIONAL after checking gvt_pin_guest_page().
> > Though for a read-only memslot, DMA_TO_DEVICE should be used instead
> > (see dma_info_to_prot()),
> > as gvt_pin_guest_page() checks (IOMMU_READ | IOMMU_WRITE) permission for each page,
> > it actually ensures that the pinned GFN is not in a read-only memslot.
> > So, it should be fine.
> > 
> > > 
> > > I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
> > > 2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
> > > 2MiB range is exposed to the guest.
> > > 
> > sorry. I may not put it clearly enough.
> > for a normal device pass-through via VFIO-PCI, VFIO maps IOMMU mappings in this way:
> > 
> > (a) fault in PFNs in a GFN range within the same memslot (VFIO saves dma_list, which is
> > the same as memslot list when vIOMMU is not on or not in shadow mode).
> > (b) map continuous PFNs into iommu driver (honour ro attribute and can > 2MiB as long as
> > PFNs are continuous).
> > (c) IOMMU driver decides to map in 2MiB or in 4KiB according to its setting.
> > 
> > For KVMGT, gvt_dma_map_page() first calls gvt_pin_guest_page() which
> > (a) calls vfio_pin_pages() to check each GFN is within allowed dma_list with
> > (IOMMU_READ | IOMMU_WRITE) permission and fault-in page. 
> > (b) checks PFNs are continuous in 2MiB,
> > 
> > Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
> > mapping size unnecessarily smaller.
> 
> Yeah, I got all that.  What I'm trying to say, and why I asked about whether or
> not userspace controls the mappings, is that AFAIK there is nothing in the kernel
> that coordinates mappings between VFIO and KVM.  So, very technically, userspace
> could map a 2MiB range contiguous in VFIO but not in KVM, or RW in VFIO but RO in KVM.
> 
> I can't imagine there's a real use case for doing so, and arguably there's no
> requirement that KVMGT honor KVM's memslot.  But because KVMGT taps into KVM's
> page-tracking, KVMGT _does_ honor KVM's memslots to some extent because KVMGT
> needs to know whether or not a given GFN can be write-protected.
> 
> I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> and permissions, and that the only requirement for KVM memslots is that GTT page
> tables need to be visible in KVM's memslots.  But if that's the ABI, then
> intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> ("drm/i915/gvt: validate gfn before set shadow page entry").
> 
> In other words, pick either VFIO or KVM.  Checking that X is valid according to
> KVM and then mapping X through VFIO is confusing and makes assumptions about how
> userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> unaware readers because the code is technically flawed.
>
Agreed. 
Then after some further thought, I think maybe we can just remove
intel_gvt_is_valid_gfn() in KVMGT, because

(1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
ppgtt_populate_spt() are not for page track purpose, but to validate bogus
GFN.
(2) gvt_pin_guest_page() with gfn and size can do the validity checking,
which is called in intel_gvt_dma_map_guest_page(). So, we can move the
mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().


As below,

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 54b32ab843eb..5a85936df6d4 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,15 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;

-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-       if (!vgpu->attached)
-               return false;
-
-       /* FIXME: This doesn't properly handle guest entries larger than 4K. */
-       return kvm_page_track_is_valid_gfn(vgpu->vfio_device.kvm, gfn);
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address
@@ -1340,16 +1331,12 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
                        ppgtt_generate_shadow_entry(&se, s, &ge);
                        ppgtt_set_shadow_entry(spt, &se, i);
                } else {
-                       gfn = ops->get_pfn(&ge);
-                       if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
+                       ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
+                       if (ret) {
                                ops->set_pfn(&se, gvt->gtt.scratch_mfn);
                                ppgtt_set_shadow_entry(spt, &se, i);
-                               continue;
-                       }
-
-                       ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
-                       if (ret)
                                goto fail;
+                       }
                }
        }
        return 0;
@@ -2336,14 +2325,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
                m.val64 = e.val64;
                m.type = e.type;

-               /* one PTE update may be issued in multiple writes and the
-                * first write may not construct a valid gfn
-                */
-               if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                       ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-                       goto out;
-               }
-
                ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
                                                   &dma_addr);
                if (ret) {


> On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
> gfn.  If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
> to handle the case where the guest creates a bogus mapping when writing an existing
> GTT PT.
Don't get it here. Could you elaborate more?

> 
> Combing all my trains of thought, what about this as an end state for this series?
> (completely untested at this point).  Get rid of the KVM mapping size checks,
> verify the validity of the entire range being mapped, and add a FIXME to complain
> about using KVM instead of VFIO to determine the validity of ranges.
> 
> static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn,
> 				   enum intel_gvt_gtt_type type)
> {
> 	unsigned long nr_pages;
> 
> 	if (!vgpu->attached)
> 		return false;
> 
> 	if (type == GTT_TYPE_PPGTT_PTE_64K_ENTRY)
> 		nr_pages = I915_GTT_PAGE_SIZE_64K >> PAGE_SHIFT;
> 	else if (type == GTT_TYPE_PPGTT_PTE_2M_ENTRY)
> 		nr_pages = I915_GTT_PAGE_SIZE_2M >> PAGE_SHIFT;
> 	else
> 		nr_pages = 1;
> 
> 	/*
> 	 * FIXME: Probe VFIO, not KVM.  VFIO is the source of truth for KVMGT
> 	 * mappings and permissions, KVM's involvement is purely to handle
> 	 * write-tracking of GTT page tables.
> 	 */
> 	return kvm_page_track_is_contiguous_gfn_range(vgpu->vfio_device.kvm,
> 						      gfn, nr_pages);
> }
> 
> static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu, unsigned long gfn,
> 				 dma_addr_t *dma_addr)
> {
> 	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
> 		return 0;
> 
> 	return intel_gvt_dma_map_guest_page(vgpu, gfn,
> 					    I915_GTT_PAGE_SIZE_2M, dma_addr);
> }
> 
> static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
> 	struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
> 	struct intel_gvt_gtt_entry *ge)
> {
> 	const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
> 	dma_addr_t dma_addr = vgpu->gvt->gtt.scratch_mfn << PAGE_SHIFT;
> 	struct intel_gvt_gtt_entry se = *ge;
> 	unsigned long gfn;
> 	int ret;
> 
> 	if (!pte_ops->test_present(ge))
> 		goto set_shadow_entry;
> 
> 	gfn = pte_ops->get_pfn(ge);
> 	if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
> 		goto set_shadow_entry;
As KVMGT only tracks PPGTT page table pages, this check here is not for page
track purpose, but to check bogus GFN.
So, Just leave the bogus GFN check to intel_gvt_dma_map_guest_page() through
VFIO is all right.

On the other hand, for the GFN validity for page track purpose, we can
leave it to kvm_write_track_add_gfn().

Do you think it's ok?


> 	...
> 
> 
> set_shadow_entry:
> 	pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
> 	ppgtt_set_shadow_entry(spt, &se, index);
> 	return 0;
> }

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-06  5:56           ` Yan Zhao
@ 2023-01-06 23:01             ` Sean Christopherson
  2023-01-09  9:58               ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-06 23:01 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Jan 06, 2023, Yan Zhao wrote:
> On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > and permissions, and that the only requirement for KVM memslots is that GTT page
> > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > 
> > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > unaware readers because the code is technically flawed.
> >
> Agreed. 
> Then after some further thought, I think maybe we can just remove
> intel_gvt_is_valid_gfn() in KVMGT, because
> 
> (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> GFN.
> (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().

IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
means that bogus GFNs will trigger error messages, e.g.

			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
				     &cur_iova, ret);

and

			gvt_vgpu_err("fail to populate guest ggtt entry\n");

One thought would be to turn those printks into tracepoints to eliminate unwanted
noise, and to prevent the guest from spamming the host kernel log by programming
garbage into the GTT (gvt_vgpu_err() isn't ratelimited).

> > On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
> > gfn.  If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
> > to handle the case where the guest creates a bogus mapping when writing an existing
> > GTT PT.
> Don't get it here. Could you elaborate more?

AFAICT, KVMGT only pre-validates the GFN on the initial setup, not when the guest
modifies a write-tracked entry.  I believe this is a moot point if the pre-validation
is removed entirely.

> > 	gfn = pte_ops->get_pfn(ge);
> > 	if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
> > 		goto set_shadow_entry;
> As KVMGT only tracks PPGTT page table pages, this check here is not for page
> track purpose, but to check bogus GFN.
> So, Just leave the bogus GFN check to intel_gvt_dma_map_guest_page() through
> VFIO is all right.
> 
> On the other hand, for the GFN validity for page track purpose, we can
> leave it to kvm_write_track_add_gfn().
> 
> Do you think it's ok?

Yep, the only hiccup is the gvt_vgpu_err() calls that are guest-triggerable, and
converting those to a tracepoint seems like the right answer.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-06 23:01             ` Sean Christopherson
@ 2023-01-09  9:58               ` Yan Zhao
  2023-01-11 17:55                 ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-01-09  9:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> On Fri, Jan 06, 2023, Yan Zhao wrote:
> > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > 
> > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > unaware readers because the code is technically flawed.
> > >
> > Agreed. 
> > Then after some further thought, I think maybe we can just remove
> > intel_gvt_is_valid_gfn() in KVMGT, because
> > 
> > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > GFN.
> > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> 
> IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> means that bogus GFNs will trigger error messages, e.g.
> 
> 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> 				     &cur_iova, ret);
> 
> and
> 
> 			gvt_vgpu_err("fail to populate guest ggtt entry\n");

Thanks for pointing it out.
I checked this commit message and found below original intentions to introduce
pre-validation:
   "GVT may receive partial write on one guest PTE update. Validate gfn
    not to translate incomplete gfn. This avoids some unnecessary error
    messages incurred by the incomplete gfn translating. Also fix the
    bug that the whole PPGTT shadow page update is aborted on any invalid
    gfn entry"

(1) first intention -- unnecessary error message came from GGTT partial write.
    For guest GGTT writes, the guest calls writeq to an MMIO GPA, which is
    8 bytes in length, while QEMU splits the MMIO write into 2 4-byte writes.
    The splitted 2 writes can cause invalid GFN to be found.

    But this partial write issue has been fixed by the two follow-up commits:
        bc0686ff5fad drm/i915/gvt: support inconsecutive partial gtt entry write
        510fe10b6180 drm/i915/gvt: fix a bug of partially write ggtt enties

    so pre-validation to reduce noise is not necessary any more here.

(2) the second intention -- "the whole PPGTT shadow page update is aborted on any
    invalid gfn entry"
    As PPGTT resides in normal guest RAM and we only treat 8-byte writes
    as valid page table writes, any invalid GPA found is regarded as
    an error, either due to guest misbehavior/attack or bug in host
    shadow code. 
    So, direct abort looks good too. Like below:

@@ -1340,13 +1338,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
                        ppgtt_generate_shadow_entry(&se, s, &ge);
                        ppgtt_set_shadow_entry(spt, &se, i);
                } else {
-                       gfn = ops->get_pfn(&ge);
-                       if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                               ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-                               ppgtt_set_shadow_entry(spt, &se, i);
-                               continue;
-                       }
-
                        ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
                        if (ret)
                                goto fail;

(I actually found that the original code will print "invalid entry type"
warning which indicates it's broken for a while due to lack of test in
this invalid gfn path)


> One thought would be to turn those printks into tracepoints to eliminate unwanted
> noise, and to prevent the guest from spamming the host kernel log by programming
> garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
As those printks would not happen in normal conditions and printks may have
some advantages to discover the attack or bug, could we just convert
gvt_vgpu_err() to be ratelimited ?

Thanks
Yan

> 
> > > On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
> > > gfn.  If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
> > > to handle the case where the guest creates a bogus mapping when writing an existing
> > > GTT PT.
> > Don't get it here. Could you elaborate more?
> 
> AFAICT, KVMGT only pre-validates the GFN on the initial setup, not when the guest
> modifies a write-tracked entry.  I believe this is a moot point if the pre-validation
> is removed entirely.
> 
> > > 	gfn = pte_ops->get_pfn(ge);
> > > 	if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
> > > 		goto set_shadow_entry;
> > As KVMGT only tracks PPGTT page table pages, this check here is not for page
> > track purpose, but to check bogus GFN.
> > So, Just leave the bogus GFN check to intel_gvt_dma_map_guest_page() through
> > VFIO is all right.
> > 
> > On the other hand, for the GFN validity for page track purpose, we can
> > leave it to kvm_write_track_add_gfn().
> > 
> > Do you think it's ok?
> 
> Yep, the only hiccup is the gvt_vgpu_err() calls that are guest-triggerable, and
> converting those to a tracepoint seems like the right answer.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-09  9:58               ` Yan Zhao
@ 2023-01-11 17:55                 ` Sean Christopherson
  2023-01-19  2:58                   ` Zhenyu Wang
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-01-11 17:55 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Paolo Bonzini, Zhenyu Wang, Zhi Wang, kvm, intel-gvt-dev,
	intel-gfx, linux-kernel, Ben Gardon

On Mon, Jan 09, 2023, Yan Zhao wrote:
> On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> > On Fri, Jan 06, 2023, Yan Zhao wrote:
> > > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > > 
> > > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > > unaware readers because the code is technically flawed.
> > > >
> > > Agreed. 
> > > Then after some further thought, I think maybe we can just remove
> > > intel_gvt_is_valid_gfn() in KVMGT, because
> > > 
> > > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > > GFN.
> > > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> > 
> > IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> > gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> > means that bogus GFNs will trigger error messages, e.g.
> > 
> > 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> > 				     &cur_iova, ret);
> > 
> > and
> > 
> > 			gvt_vgpu_err("fail to populate guest ggtt entry\n");
> 
> Thanks for pointing it out.
> I checked this commit message and found below original intentions to introduce
> pre-validation:

...

> (I actually found that the original code will print "invalid entry type"
> warning which indicates it's broken for a while due to lack of test in
> this invalid gfn path)
> 
> 
> > One thought would be to turn those printks into tracepoints to eliminate unwanted
> > noise, and to prevent the guest from spamming the host kernel log by programming
> > garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
> As those printks would not happen in normal conditions and printks may have
> some advantages to discover the attack or bug, could we just convert
> gvt_vgpu_err() to be ratelimited ?

That's ultimately a decision that needs to be made by the GVT maintainers, as the
answer depends on the use case.  E.g. if most users of KVMGT run a single VM and
the guest user is also the host admin, then pr_err_ratelimited() is likely an
acceptable/preferable choice as there's a decent chance a human will see the errors
in the host kernel logs and be able to take action.

But if there's unlikely to be a human monitoring the host logs, and/or the guest
user is unrelated to the host admin, then a ratelimited printk() is less useful.
E.g. if there's no one monitoring the logs, then losing messages due to
ratelimiting provides a worse debug experience overall than having to manually
enable tracepoints.   And if there may be many tens of VMs (seems unlikely?), then
ratelimited printk() is even less useful because errors for a specific VM may be
lost, i.e. the printk() can't be relied upon in any way to detect issues.

FWIW, in KVM proper, use of printk() to capture guest "errors" is strongly discourage
for exactly these reasons.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-05  3:07       ` Yan Zhao
  2023-01-05 17:40         ` Sean Christopherson
@ 2023-01-12  8:31         ` Yan Zhao
  1 sibling, 0 replies; 61+ messages in thread
From: Yan Zhao @ 2023-01-12  8:31 UTC (permalink / raw)
  To: Sean Christopherson, kvm, intel-gfx, linux-kernel, Zhenyu Wang,
	Ben Gardon, Paolo Bonzini, intel-gvt-dev, Zhi Wang

> > > > Note, KVM may also restrict the mapping size for reasons that aren't
> > > > relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> > > Will iTLB multi-hit affect DMA?
> > 
> > I highly doubt it, I can't imagine an IOMMU would have a dedicated instruction
> > TLB :-)
> I can double check it with IOMMU hardware experts.
> But if DMA would tamper instruction TLB, it should have been reported
> as an issue with normal VFIO pass-through?

hi Sean,
This is the feedback:

- CPU Instruction TLB is only filled when CPU fetches an instruction.
- IOMMU uses IOTLB to cache IOVA translation.
  A remapping hardware may implement multiple IOTLBs, and some of these may
  be for special purposes, e.g., only for instruction fetches.
  There is no way for software to be aware that multiple
  translations for smaller pages have been used for a large page. If software
  modifies the paging structures so that the page size used for a 4-KByte range
  of input-addresses changes, the IOTLBs may subsequently contain multiple
  translations for the address range (one for each page size).
  A reference to a input-address in the address range may use any of these
  translations. Which translation is used may vary from one execution to
  another, and the choice may be implementation-specific.
- Theres no similar bug related to DMA requests for instruction fetch hitting
  multiple IOTLB entries reported in IOMMU side.
  The X bit in IOMMU paging structure is to be removed in future and is
  currently always unset.

Thanks
Yan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-11 17:55                 ` Sean Christopherson
@ 2023-01-19  2:58                   ` Zhenyu Wang
  2023-01-19  5:26                     ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Zhenyu Wang @ 2023-01-19  2:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yan Zhao, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

[-- Attachment #1: Type: text/plain, Size: 4517 bytes --]

On 2023.01.11 17:55:04 +0000, Sean Christopherson wrote:
> On Mon, Jan 09, 2023, Yan Zhao wrote:
> > On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> > > On Fri, Jan 06, 2023, Yan Zhao wrote:
> > > > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > > > 
> > > > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > > > unaware readers because the code is technically flawed.
> > > > >
> > > > Agreed. 
> > > > Then after some further thought, I think maybe we can just remove
> > > > intel_gvt_is_valid_gfn() in KVMGT, because
> > > > 
> > > > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > > > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > > > GFN.
> > > > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > > > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > > > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> > > 
> > > IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> > > gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> > > means that bogus GFNs will trigger error messages, e.g.
> > > 
> > > 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> > > 				     &cur_iova, ret);
> > > 
> > > and
> > > 
> > > 			gvt_vgpu_err("fail to populate guest ggtt entry\n");
> > 
> > Thanks for pointing it out.
> > I checked this commit message and found below original intentions to introduce
> > pre-validation:
> 
> ...
> 
> > (I actually found that the original code will print "invalid entry type"
> > warning which indicates it's broken for a while due to lack of test in
> > this invalid gfn path)
> > 
> > 
> > > One thought would be to turn those printks into tracepoints to eliminate unwanted
> > > noise, and to prevent the guest from spamming the host kernel log by programming
> > > garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
> > As those printks would not happen in normal conditions and printks may have
> > some advantages to discover the attack or bug, could we just convert
> > gvt_vgpu_err() to be ratelimited ?
> 
> That's ultimately a decision that needs to be made by the GVT maintainers, as the
> answer depends on the use case.  E.g. if most users of KVMGT run a single VM and
> the guest user is also the host admin, then pr_err_ratelimited() is likely an
> acceptable/preferable choice as there's a decent chance a human will see the errors
> in the host kernel logs and be able to take action.
> 
> But if there's unlikely to be a human monitoring the host logs, and/or the guest
> user is unrelated to the host admin, then a ratelimited printk() is less useful.
> E.g. if there's no one monitoring the logs, then losing messages due to
> ratelimiting provides a worse debug experience overall than having to manually
> enable tracepoints.   And if there may be many tens of VMs (seems unlikely?), then
> ratelimited printk() is even less useful because errors for a specific VM may be
> lost, i.e. the printk() can't be relied upon in any way to detect issues.
> 
> FWIW, in KVM proper, use of printk() to capture guest "errors" is strongly discourage
> for exactly these reasons.

Current KVMGT usage is mostly in controlled mode, either user is own host admin,
or host admin would pre-configure specific limited number of VMs for KVMGT use.
I think printk on error should be fine, we don't need rate limit, and adding
extra trace monitor for admin might not be necessary. So I'm towards to keep to
use current error message.

thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-19  2:58                   ` Zhenyu Wang
@ 2023-01-19  5:26                     ` Yan Zhao
  2023-02-23 20:41                       ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-01-19  5:26 UTC (permalink / raw)
  To: Zhenyu Wang
  Cc: Sean Christopherson, kvm, intel-gfx, linux-kernel, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, Jan 19, 2023 at 10:58:42AM +0800, Zhenyu Wang wrote:
> On 2023.01.11 17:55:04 +0000, Sean Christopherson wrote:
> > On Mon, Jan 09, 2023, Yan Zhao wrote:
> > > On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> > > > On Fri, Jan 06, 2023, Yan Zhao wrote:
> > > > > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > > > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > > > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > > > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > > > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > > > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > > > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > > > > 
> > > > > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > > > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > > > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > > > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > > > > unaware readers because the code is technically flawed.
> > > > > >
> > > > > Agreed. 
> > > > > Then after some further thought, I think maybe we can just remove
> > > > > intel_gvt_is_valid_gfn() in KVMGT, because
> > > > > 
> > > > > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > > > > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > > > > GFN.
> > > > > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > > > > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > > > > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> > > > 
> > > > IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> > > > gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> > > > means that bogus GFNs will trigger error messages, e.g.
> > > > 
> > > > 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> > > > 				     &cur_iova, ret);
> > > > 
> > > > and
> > > > 
> > > > 			gvt_vgpu_err("fail to populate guest ggtt entry\n");
> > > 
> > > Thanks for pointing it out.
> > > I checked this commit message and found below original intentions to introduce
> > > pre-validation:
> > 
> > ...
> > 
> > > (I actually found that the original code will print "invalid entry type"
> > > warning which indicates it's broken for a while due to lack of test in
> > > this invalid gfn path)
> > > 
> > > 
> > > > One thought would be to turn those printks into tracepoints to eliminate unwanted
> > > > noise, and to prevent the guest from spamming the host kernel log by programming
> > > > garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
> > > As those printks would not happen in normal conditions and printks may have
> > > some advantages to discover the attack or bug, could we just convert
> > > gvt_vgpu_err() to be ratelimited ?
> > 
> > That's ultimately a decision that needs to be made by the GVT maintainers, as the
> > answer depends on the use case.  E.g. if most users of KVMGT run a single VM and
> > the guest user is also the host admin, then pr_err_ratelimited() is likely an
> > acceptable/preferable choice as there's a decent chance a human will see the errors
> > in the host kernel logs and be able to take action.
> > 
> > But if there's unlikely to be a human monitoring the host logs, and/or the guest
> > user is unrelated to the host admin, then a ratelimited printk() is less useful.
> > E.g. if there's no one monitoring the logs, then losing messages due to
> > ratelimiting provides a worse debug experience overall than having to manually
> > enable tracepoints.   And if there may be many tens of VMs (seems unlikely?), then
> > ratelimited printk() is even less useful because errors for a specific VM may be
> > lost, i.e. the printk() can't be relied upon in any way to detect issues.
> > 
> > FWIW, in KVM proper, use of printk() to capture guest "errors" is strongly discourage
> > for exactly these reasons.
> 
> Current KVMGT usage is mostly in controlled mode, either user is own host admin,
> or host admin would pre-configure specific limited number of VMs for KVMGT use.
> I think printk on error should be fine, we don't need rate limit, and adding
> extra trace monitor for admin might not be necessary. So I'm towards to keep to
> use current error message.
> 

Thanks, Sean and Zhenyu.
So, could I just post the final fix as below?
And, Sean, would you like to include it in this series or should I send it out
first?

From dcc931011da3712333f61684ebb20765dbf2fb46 Mon Sep 17 00:00:00 2001
From: Yan Zhao <yan.y.zhao@intel.com>
Date: Thu, 19 Jan 2023 11:15:54 +0800
Subject: [PATCH] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn

Currently intel_gvt_is_valid_gfn() is called in two places:
(1) shadowing guest GGTT entry
(2) shadowing guest PPGTT leaf entry,
which was introduced in commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

However, now it's not necessary to call this interface any more, because
a. GGTT partial write issue has been fixed by
   commit bc0686ff5fad
   ("drm/i915/gvt: support inconsecutive partial gtt entry write")
   commit 510fe10b6180
   ("drm/i915/gvt: fix a bug of partially write ggtt enties")
b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
   as valid page table writes. Any invalid GPA found is regarded as
   an error, either due to guest misbehavior/attack or bug in host
   shadow code.
   So,rather than do GFN pre-checking and replace invalid GFNs with
   scratch GFN and continue silently, just remove the pre-checking and
   abort PPGTT shadowing on error detected.
c. GFN validity check is still performed in
   intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
   It's more desirable to call VFIO interface to do both validity check
   and mapping.
   Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
   while later mapping the GFN through VFIO interface is unnecessarily
   fragile and confusing for unaware readers.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 31 -------------------------------
 1 file changed, 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 445afecbe7ae..9b6c2ca1ee16 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,22 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;

-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-       struct kvm *kvm = vgpu->vfio_device.kvm;
-       int idx;
-       bool ret;
-
-       if (!vgpu->attached)
-               return false;
-
-       idx = srcu_read_lock(&kvm->srcu);
-       ret = kvm_is_visible_gfn(kvm, gfn);
-       srcu_read_unlock(&kvm->srcu, idx);
-
-       return ret;
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address

@@ -1345,13 +1329,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
                        ppgtt_generate_shadow_entry(&se, s, &ge);
                        ppgtt_set_shadow_entry(spt, &se, i);
                } else {
-                       gfn = ops->get_pfn(&ge);
-                       if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                               ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-                               ppgtt_set_shadow_entry(spt, &se, i);
-                               continue;
-                       }
-
                        ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
                        if (ret)
                                goto fail;
@@ -2326,14 +2303,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
                m.val64 = e.val64;
                m.type = e.type;

-               /* one PTE update may be issued in multiple writes and the
-                * first write may not construct a valid gfn
-                */
-               if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                       ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-                       goto out;
-               }
-
                ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
                                                   &dma_addr);
                if (ret) {
--
2.17.1



^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-01-19  5:26                     ` Yan Zhao
@ 2023-02-23 20:41                       ` Sean Christopherson
  2023-02-24  5:09                         ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-02-23 20:41 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Zhenyu Wang, kvm, intel-gfx, linux-kernel, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

Apologies for the super slow reply, I put this series on the backburner while I
caught up on other stuff and completely missed your questions.

On Thu, Jan 19, 2023, Yan Zhao wrote:
> On Thu, Jan 19, 2023 at 10:58:42AM +0800, Zhenyu Wang wrote:
> > Current KVMGT usage is mostly in controlled mode, either user is own host admin,
> > or host admin would pre-configure specific limited number of VMs for KVMGT use.
> > I think printk on error should be fine, we don't need rate limit, and adding
> > extra trace monitor for admin might not be necessary. So I'm towards to keep to
> > use current error message.
> > 
> 
> Thanks, Sean and Zhenyu.
> So, could I just post the final fix as below?

No objection here.

> And, Sean, would you like to include it in this series or should I send it out
> first?

I'd like to include it in this series as it's necessary (for some definitions of
necessary) to clean up KVM's APIs, and the main benefactor is KVM, i.e. getting
the patch merged sooner than later doesn't really benefit KVMGT itself.

Thanks much!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
  2023-02-23 20:41                       ` Sean Christopherson
@ 2023-02-24  5:09                         ` Yan Zhao
  0 siblings, 0 replies; 61+ messages in thread
From: Yan Zhao @ 2023-02-24  5:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, Feb 23, 2023 at 12:41:28PM -0800, Sean Christopherson wrote:
> Apologies for the super slow reply, I put this series on the backburner while I
> caught up on other stuff and completely missed your questions.
>
Never mind :)

> On Thu, Jan 19, 2023, Yan Zhao wrote:
> > On Thu, Jan 19, 2023 at 10:58:42AM +0800, Zhenyu Wang wrote:
> > > Current KVMGT usage is mostly in controlled mode, either user is own host admin,
> > > or host admin would pre-configure specific limited number of VMs for KVMGT use.
> > > I think printk on error should be fine, we don't need rate limit, and adding
> > > extra trace monitor for admin might not be necessary. So I'm towards to keep to
> > > use current error message.
> > > 
> > 
> > Thanks, Sean and Zhenyu.
> > So, could I just post the final fix as below?
> 
> No objection here.
> 
> > And, Sean, would you like to include it in this series or should I send it out
> > first?
> 
> I'd like to include it in this series as it's necessary (for some definitions of
> necessary) to clean up KVM's APIs, and the main benefactor is KVM, i.e. getting
> the patch merged sooner than later doesn't really benefit KVMGT itself.
> 
> Thanks much!

Then please include it and I can help to test once you sending out next
version.

Thanks
Yan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2022-12-23  0:57 ` [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
  2022-12-28  6:56   ` Yan Zhao
@ 2023-08-07 12:01   ` Like Xu
  2023-08-07 17:19     ` Sean Christopherson
  1 sibling, 1 reply; 61+ messages in thread
From: Like Xu @ 2023-08-07 12:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao,
	Ben Gardon, Zhi Wang, Paolo Bonzini, Zhenyu Wang

On 23/12/2022 8:57 am, Sean Christopherson wrote:
> +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +					const u8 *new, int bytes)
> +{
> +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> +
> +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> +}

The kvm_mmu_track_write() is only used for x86, where the incoming parameter
"u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
emulated page table writes"), please help confirm if it's still needed ? Thanks.
A minor clean up is proposed.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-07 12:01   ` Like Xu
@ 2023-08-07 17:19     ` Sean Christopherson
  2023-08-09  1:02       ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-08-07 17:19 UTC (permalink / raw)
  To: Like Xu
  Cc: kvm, intel-gvt-dev, intel-gfx, linux-kernel, Yan Zhao,
	Ben Gardon, Zhi Wang, Paolo Bonzini, Zhenyu Wang

On Mon, Aug 07, 2023, Like Xu wrote:
> On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > +					const u8 *new, int bytes)
> > +{
> > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > +
> > +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > +}
> 
> The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> emulated page table writes"), please help confirm if it's still needed ? Thanks.
> A minor clean up is proposed.

Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
So I think we can remove @new from kvm_page_track_write() entirely.

Feel free to send a patch, otherwise I'll get to it later this week.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-07 17:19     ` Sean Christopherson
@ 2023-08-09  1:02       ` Yan Zhao
  2023-08-09 14:33         ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-08-09  1:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Like Xu, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Mon, Aug 07, 2023 at 10:19:07AM -0700, Sean Christopherson wrote:
> On Mon, Aug 07, 2023, Like Xu wrote:
> > On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > > +					const u8 *new, int bytes)
> > > +{
> > > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > > +
> > > +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > > +}
> > 
> > The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> > "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> > emulated page table writes"), please help confirm if it's still needed ? Thanks.
> > A minor clean up is proposed.
> 
> Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
> So I think we can remove @new from kvm_page_track_write() entirely.
Sorry for the late reply.
Yes, KVMGT does not consume @new and it reads the guest PTE again in the
page track write handler.

But I have a couple of questions related to the memtioned commit as
below:

(1) If "re-reading the current value of the guest PTE after the MMU lock has
been acquired", then should KVMGT also acquire the MMU lock too?
If so, could we move the MMU lock and unlock into kvm_page_track_write()
as it's common.

(2) Even if KVMGT consumes @new,
will kvm_page_track_write() be called for once or twice if there are two
concurent emulated write?


commit 0e0fee5c539b61fdd098332e0e2cc375d9073706
Author: Junaid Shahid <junaids@google.com>
Date:   Wed Oct 31 14:53:57 2018 -0700

    kvm: mmu: Fix race in emulated page table writes
    
    When a guest page table is updated via an emulated write,
    kvm_mmu_pte_write() is called to update the shadow PTE using the just
    written guest PTE value. But if two emulated guest PTE writes happened
    concurrently, it is possible that the guest PTE and the shadow PTE end
    up being out of sync. Emulated writes do not mark the shadow page as
    unsync-ed, so this inconsistency will not be resolved even by a guest TLB
    flush (unless the page was marked as unsync-ed at some other point).
    
    This is fixed by re-reading the current value of the guest PTE after the
    MMU lock has been acquired instead of just using the value that was
    written prior to calling kvm_mmu_pte_write().







^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-09  1:02       ` Yan Zhao
@ 2023-08-09 14:33         ` Sean Christopherson
  2023-08-09 23:21           ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-08-09 14:33 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Like Xu, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Wed, Aug 09, 2023, Yan Zhao wrote:
> On Mon, Aug 07, 2023 at 10:19:07AM -0700, Sean Christopherson wrote:
> > On Mon, Aug 07, 2023, Like Xu wrote:
> > > On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > > > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > > > +					const u8 *new, int bytes)
> > > > +{
> > > > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > > > +
> > > > +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > > > +}
> > > 
> > > The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> > > "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> > > emulated page table writes"), please help confirm if it's still needed ? Thanks.
> > > A minor clean up is proposed.
> > 
> > Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
> > So I think we can remove @new from kvm_page_track_write() entirely.
> Sorry for the late reply.
> Yes, KVMGT does not consume @new and it reads the guest PTE again in the
> page track write handler.
> 
> But I have a couple of questions related to the memtioned commit as
> below:
> 
> (1) If "re-reading the current value of the guest PTE after the MMU lock has
> been acquired", then should KVMGT also acquire the MMU lock too?

No.  If applicable, KVMGT should read the new/current value after acquiring
whatever lock protects the generation (or update) of the shadow entries.  I
suspect KVMGT already does this, but I don't have time to confirm that at this
exact memory.

The race that was fixed in KVM was:

  vCPU0         vCPU1   
  write X
                 write Y
                 sync SPTE w/ Y
  sync SPTE w/ X

Reading the value after acquiring mmu_lock ensures that both vCPUs will see whatever
value "loses" the race, i.e. whatever written value is processed second ('Y' in the
above sequence).

> If so, could we move the MMU lock and unlock into kvm_page_track_write()
> as it's common.
> 
> (2) Even if KVMGT consumes @new,
> will kvm_page_track_write() be called for once or twice if there are two
> concurent emulated write?

Twice, kvm_page_track_write() is wired up directly to the emulation of the write,
i.e. there is no batching.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-09 14:33         ` Sean Christopherson
@ 2023-08-09 23:21           ` Yan Zhao
  2023-08-10  3:02             ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-08-09 23:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Like Xu, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Wed, Aug 09, 2023 at 07:33:45AM -0700, Sean Christopherson wrote:
> On Wed, Aug 09, 2023, Yan Zhao wrote:
> > On Mon, Aug 07, 2023 at 10:19:07AM -0700, Sean Christopherson wrote:
> > > On Mon, Aug 07, 2023, Like Xu wrote:
> > > > On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > > > > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > > > > +					const u8 *new, int bytes)
> > > > > +{
> > > > > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > > > > +
> > > > > +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > > > > +}
> > > > 
> > > > The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> > > > "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> > > > emulated page table writes"), please help confirm if it's still needed ? Thanks.
> > > > A minor clean up is proposed.
> > > 
> > > Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
> > > So I think we can remove @new from kvm_page_track_write() entirely.
> > Sorry for the late reply.
> > Yes, KVMGT does not consume @new and it reads the guest PTE again in the
> > page track write handler.
> > 
> > But I have a couple of questions related to the memtioned commit as
> > below:
> > 
> > (1) If "re-reading the current value of the guest PTE after the MMU lock has
> > been acquired", then should KVMGT also acquire the MMU lock too?
> 
> No.  If applicable, KVMGT should read the new/current value after acquiring
> whatever lock protects the generation (or update) of the shadow entries.  I
> suspect KVMGT already does this, but I don't have time to confirm that at this
I think the mutex lock and unlock of info->vgpu_lock you added in
kvmgt_page_track_write() is the counterpart :)

> exact memory.
> 
> The race that was fixed in KVM was:
> 
>   vCPU0         vCPU1   
>   write X
>                  write Y
>                  sync SPTE w/ Y
>   sync SPTE w/ X
> 
> Reading the value after acquiring mmu_lock ensures that both vCPUs will see whatever
> value "loses" the race, i.e. whatever written value is processed second ('Y' in the
> above sequence).
I suspect that vCPU0 may still generate a wrong SPTE if vCPU1 wrote 4
bytes while vCPU0 wrote 8 bytes, though the chances are very low.


> 
> > If so, could we move the MMU lock and unlock into kvm_page_track_write()
> > as it's common.
> > 
> > (2) Even if KVMGT consumes @new,
> > will kvm_page_track_write() be called for once or twice if there are two
> > concurent emulated write?
> 
> Twice, kvm_page_track_write() is wired up directly to the emulation of the write,
> i.e. there is no batching.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-09 23:21           ` Yan Zhao
@ 2023-08-10  3:02             ` Yan Zhao
  2023-08-10 15:41               ` Sean Christopherson
  0 siblings, 1 reply; 61+ messages in thread
From: Yan Zhao @ 2023-08-10  3:02 UTC (permalink / raw)
  To: Sean Christopherson, Like Xu, kvm, intel-gfx, linux-kernel,
	Zhenyu Wang, Ben Gardon, Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, Aug 10, 2023 at 07:21:03AM +0800, Yan Zhao wrote:
> On Wed, Aug 09, 2023 at 07:33:45AM -0700, Sean Christopherson wrote:
> > On Wed, Aug 09, 2023, Yan Zhao wrote:
> > > On Mon, Aug 07, 2023 at 10:19:07AM -0700, Sean Christopherson wrote:
> > > > On Mon, Aug 07, 2023, Like Xu wrote:
> > > > > On 23/12/2022 8:57 am, Sean Christopherson wrote:
> > > > > > +static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> > > > > > +					const u8 *new, int bytes)
> > > > > > +{
> > > > > > +	__kvm_page_track_write(vcpu, gpa, new, bytes);
> > > > > > +
> > > > > > +	kvm_mmu_track_write(vcpu, gpa, new, bytes);
> > > > > > +}
> > > > > 
> > > > > The kvm_mmu_track_write() is only used for x86, where the incoming parameter
> > > > > "u8 *new" has not been required since 0e0fee5c539b ("kvm: mmu: Fix race in
> > > > > emulated page table writes"), please help confirm if it's still needed ? Thanks.
> > > > > A minor clean up is proposed.
> > > > 
> > > > Hmm, unless I'm misreading things, KVMGT ultimately doesn't consume @new either.
> > > > So I think we can remove @new from kvm_page_track_write() entirely.
> > > Sorry for the late reply.
> > > Yes, KVMGT does not consume @new and it reads the guest PTE again in the
> > > page track write handler.
> > > 
> > > But I have a couple of questions related to the memtioned commit as
> > > below:
> > > 
> > > (1) If "re-reading the current value of the guest PTE after the MMU lock has
> > > been acquired", then should KVMGT also acquire the MMU lock too?
> > 
> > No.  If applicable, KVMGT should read the new/current value after acquiring
> > whatever lock protects the generation (or update) of the shadow entries.  I
> > suspect KVMGT already does this, but I don't have time to confirm that at this
> I think the mutex lock and unlock of info->vgpu_lock you added in
> kvmgt_page_track_write() is the counterpart :)
> 
> > exact memory.
> > 
> > The race that was fixed in KVM was:
> > 
> >   vCPU0         vCPU1   
> >   write X
> >                  write Y
> >                  sync SPTE w/ Y
> >   sync SPTE w/ X
> > 
> > Reading the value after acquiring mmu_lock ensures that both vCPUs will see whatever
> > value "loses" the race, i.e. whatever written value is processed second ('Y' in the
> > above sequence).
> I suspect that vCPU0 may still generate a wrong SPTE if vCPU1 wrote 4
> bytes while vCPU0 wrote 8 bytes, though the chances are very low.
> 
This could happen in below sequence:
vCPU0 updates a PTE to AABBCCDD;
vCPU1 updates a PTE to EEFFGGHH in two writes.
(each character stands for a byte)

vCPU0                  vCPU1   
write AABBCCDD
                       write GGHH
                       detect 4 bytes write and hold on sync
sync SPTE w/ AABBGGHH
                       write EEFF
                       sync SPTE w/ EEFFGGHH


Do you think it worth below serialization work?

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a915e23d61fa..51cd0ab73529 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1445,6 +1445,8 @@ struct kvm_arch {
         */
 #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
        struct kvm_mmu_memory_cache split_desc_cache;
+
+       struct xarray track_writing_range;
 };

 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index fd04e618ad2d..4b271701dcf6 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -142,12 +142,14 @@ void kvm_page_track_cleanup(struct kvm *kvm)

        head = &kvm->arch.track_notifier_head;
        cleanup_srcu_struct(&head->track_srcu);
+       xa_destroy(&kvm->arch.track_writing_range);
 }

 int kvm_page_track_init(struct kvm *kvm)
 {
        struct kvm_page_track_notifier_head *head;

+       xa_init(&kvm->arch.track_writing_range);
        head = &kvm->arch.track_notifier_head;
        INIT_HLIST_HEAD(&head->track_notifier_list);
        return init_srcu_struct(&head->track_srcu);
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 62f98c6c5af3..1829792b9892 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -47,12 +47,46 @@ static inline bool kvm_page_track_has_external_user(struct kvm *kvm) { return fa

 #endif /* CONFIG_KVM_EXTERNAL_WRITE_TRACKING */

-static inline void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-                                       const u8 *new, int bytes)
+static inline void kvm_page_track_write_begin(struct kvm_vcpu *vcpu, gpa_t gpa,
+                                             int bytes)
 {
+       struct kvm *kvm = vcpu->kvm;
+       gfn_t gfn = gpa_to_gfn(gpa);
+
+       WARN_ON(gfn != gpa_to_gfn(gpa + bytes - 1));
+
+       if (!kvm_page_track_write_tracking_enabled(kvm))
+               return;
+
+retry:
+       if (xa_insert(&kvm->arch.track_writing_range, gfn, xa_mk_value(gfn),
+                     GFP_KERNEL_ACCOUNT)) {
+               cpu_relax();
+               goto retry;
+       }
+       return;
+}
+
+static inline void kvm_page_track_write_abort(struct kvm_vcpu *vcpu, gpa_t gpa,
+                                             int bytes)
+{
+       if (!kvm_page_track_write_tracking_enabled(vcpu->kvm))
+               return;
+
+       xa_erase(&vcpu->kvm->arch.track_writing_range, gpa_to_gfn(gpa));
+}
+
+static inline void kvm_page_track_write_end(struct kvm_vcpu *vcpu, gpa_t gpa,
+                                           const u8 *new, int bytes)
+{
+       if (!kvm_page_track_write_tracking_enabled(vcpu->kvm))
+               return;
+
        __kvm_page_track_write(vcpu->kvm, gpa, new, bytes);

        kvm_mmu_track_write(vcpu, gpa, new, bytes);
+
+       xa_erase(&vcpu->kvm->arch.track_writing_range, gpa_to_gfn(gpa));
 }

 #endif /* __KVM_X86_PAGE_TRACK_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a68d7d99fe..9b75829d5d7a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7544,10 +7544,13 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 {
        int ret;

+       kvm_page_track_write_begin(vcpu, gpa, bytes);
        ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
-       if (ret < 0)
+       if (ret < 0) {
+               kvm_page_track_write_abort(vcpu, gpa, bytes);
                return 0;
-       kvm_page_track_write(vcpu, gpa, val, bytes);
+       }
+       kvm_page_track_write_end(vcpu, gpa, val, bytes);
        return 1;
 }

@@ -7792,6 +7795,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,

        hva += offset_in_page(gpa);

+       kvm_page_track_write_begin(vcpu, gpa, bytes);
        switch (bytes) {
        case 1:
                r = emulator_try_cmpxchg_user(u8, hva, old, new);
@@ -7809,12 +7813,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
                BUG();
        }

-       if (r < 0)
+       if (r < 0) {
+               kvm_page_track_write_abort(vcpu, gpa, bytes);
                return X86EMUL_UNHANDLEABLE;
-       if (r)
+       }
+       if (r) {
+               kvm_page_track_write_abort(vcpu, gpa, bytes);
                return X86EMUL_CMPXCHG_FAILED;
+       }

-       kvm_page_track_write(vcpu, gpa, new, bytes);
+       kvm_page_track_write_end(vcpu, gpa, new, bytes);

        return X86EMUL_CONTINUE;


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-10  3:02             ` Yan Zhao
@ 2023-08-10 15:41               ` Sean Christopherson
  2023-08-11  5:57                 ` Yan Zhao
  0 siblings, 1 reply; 61+ messages in thread
From: Sean Christopherson @ 2023-08-10 15:41 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Like Xu, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, Aug 10, 2023, Yan Zhao wrote:
> On Thu, Aug 10, 2023 at 07:21:03AM +0800, Yan Zhao wrote:
> > > Reading the value after acquiring mmu_lock ensures that both vCPUs will see whatever
> > > value "loses" the race, i.e. whatever written value is processed second ('Y' in the
> > > above sequence).
> > I suspect that vCPU0 may still generate a wrong SPTE if vCPU1 wrote 4
> > bytes while vCPU0 wrote 8 bytes, though the chances are very low.
> > 
> This could happen in below sequence:
> vCPU0 updates a PTE to AABBCCDD;
> vCPU1 updates a PTE to EEFFGGHH in two writes.
> (each character stands for a byte)
> 
> vCPU0                  vCPU1   
> write AABBCCDD
>                        write GGHH
>                        detect 4 bytes write and hold on sync
> sync SPTE w/ AABBGGHH
>                        write EEFF
>                        sync SPTE w/ EEFFGGHH
> 
> 
> Do you think it worth below serialization work?

No, because I don't see any KVM bugs with the above sequence.  If the guest doesn't
ensure *all* writes from vCPU0 and vCPU1 are fully serialized, then it is completely
legal for hardware (KVM in this case) to consume AABBGGHH as a PTE.  The only thing
the guest shouldn't see is EEFFCCDD, but I don't see how that can happen.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users
  2023-08-10 15:41               ` Sean Christopherson
@ 2023-08-11  5:57                 ` Yan Zhao
  0 siblings, 0 replies; 61+ messages in thread
From: Yan Zhao @ 2023-08-11  5:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Like Xu, kvm, intel-gfx, linux-kernel, Zhenyu Wang, Ben Gardon,
	Paolo Bonzini, intel-gvt-dev, Zhi Wang

On Thu, Aug 10, 2023 at 08:41:14AM -0700, Sean Christopherson wrote:
> On Thu, Aug 10, 2023, Yan Zhao wrote:
> > On Thu, Aug 10, 2023 at 07:21:03AM +0800, Yan Zhao wrote:
> > > > Reading the value after acquiring mmu_lock ensures that both vCPUs will see whatever
> > > > value "loses" the race, i.e. whatever written value is processed second ('Y' in the
> > > > above sequence).
> > > I suspect that vCPU0 may still generate a wrong SPTE if vCPU1 wrote 4
> > > bytes while vCPU0 wrote 8 bytes, though the chances are very low.
> > > 
> > This could happen in below sequence:
> > vCPU0 updates a PTE to AABBCCDD;
> > vCPU1 updates a PTE to EEFFGGHH in two writes.
> > (each character stands for a byte)
> > 
> > vCPU0                  vCPU1   
> > write AABBCCDD
> >                        write GGHH
> >                        detect 4 bytes write and hold on sync
> > sync SPTE w/ AABBGGHH
> >                        write EEFF
> >                        sync SPTE w/ EEFFGGHH
> > 
> > 
> > Do you think it worth below serialization work?
> 
> No, because I don't see any KVM bugs with the above sequence.  If the guest doesn't
> ensure *all* writes from vCPU0 and vCPU1 are fully serialized, then it is completely
> legal for hardware (KVM in this case) to consume AABBGGHH as a PTE.  The only thing
> the guest shouldn't see is EEFFCCDD, but I don't see how that can happen.
Ok, though still feel it's a little odd when a 1st cmpxch instruction on a GPA is still
under emulation, a 2nd or 3rd... cmpxch instruction to the same GPA may have returned
and they all succeeded :)

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2023-08-11  6:25 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-23  0:57 [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
2022-12-23  0:57 ` [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
2022-12-23  0:57 ` [PATCH 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
2022-12-23  0:57 ` [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
2022-12-28  5:42   ` Yan Zhao
2023-01-03 21:13     ` Sean Christopherson
2023-01-05  3:07       ` Yan Zhao
2023-01-05 17:40         ` Sean Christopherson
2023-01-06  5:56           ` Yan Zhao
2023-01-06 23:01             ` Sean Christopherson
2023-01-09  9:58               ` Yan Zhao
2023-01-11 17:55                 ` Sean Christopherson
2023-01-19  2:58                   ` Zhenyu Wang
2023-01-19  5:26                     ` Yan Zhao
2023-02-23 20:41                       ` Sean Christopherson
2023-02-24  5:09                         ` Yan Zhao
2023-01-12  8:31         ` Yan Zhao
2022-12-23  0:57 ` [PATCH 04/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
2022-12-23  0:57 ` [PATCH 05/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
2022-12-23  0:57 ` [PATCH 06/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
2022-12-23  0:57 ` [PATCH 07/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
2022-12-23  0:57 ` [PATCH 08/27] drm/i915/gvt: Hoist acquisition of vgpu_lock out to kvmgt_page_track_write() Sean Christopherson
2022-12-23  0:57 ` [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex Sean Christopherson
2022-12-28  5:03   ` Yan Zhao
2023-01-03 20:43     ` Sean Christopherson
2023-01-05  0:51       ` Yan Zhao
2022-12-23  0:57 ` [PATCH 10/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
2022-12-23  0:57 ` [PATCH 11/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
2022-12-23  0:57 ` [PATCH 12/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
2022-12-23  0:57 ` [PATCH 13/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
2022-12-23  0:57 ` [PATCH 14/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
2022-12-23  0:57 ` [PATCH 15/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
2022-12-23  0:57 ` [PATCH 16/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
2022-12-23  0:57 ` [PATCH 17/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
2022-12-23  0:57 ` [PATCH 18/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
2022-12-23  0:57 ` [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
2022-12-28  6:56   ` Yan Zhao
2023-01-04  0:50     ` Sean Christopherson
2023-08-07 12:01   ` Like Xu
2023-08-07 17:19     ` Sean Christopherson
2023-08-09  1:02       ` Yan Zhao
2023-08-09 14:33         ` Sean Christopherson
2023-08-09 23:21           ` Yan Zhao
2023-08-10  3:02             ` Yan Zhao
2023-08-10 15:41               ` Sean Christopherson
2023-08-11  5:57                 ` Yan Zhao
2022-12-23  0:57 ` [PATCH 20/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
2022-12-23  0:57 ` [PATCH 21/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
2022-12-23  0:57 ` [PATCH 22/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
2022-12-23  0:57 ` [PATCH 23/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
2022-12-23  0:57 ` [PATCH 24/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
2022-12-23  0:57 ` [PATCH 25/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
2022-12-23  0:57 ` [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid Sean Christopherson
2022-12-28  7:57   ` Yan Zhao
2023-01-03 21:19     ` Sean Christopherson
2023-01-05  3:12       ` Yan Zhao
2023-01-05 17:53         ` Sean Christopherson
2022-12-23  0:57 ` [PATCH 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
2022-12-23  9:05 ` [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Yan Zhao
2023-01-04  1:01   ` Sean Christopherson
2023-01-05  3:13     ` Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).