kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
@ 2024-01-15 12:56 Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 01/20] KVM: pfncache: Add a map helper function Paul Durrant
                   ` (20 more replies)
  0 siblings, 21 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

This series has one small fix to what was in v11 [1]:

* KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set

The v11 patch failed to set the return code of the ioctl if the mode
was not actually changed, leading to a spurious failure.

This version of the series also contains a new bug-fix to the pfncache
code from David Woodhouse.

[1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/

David Woodhouse (1):
  KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

Paul Durrant (19):
  KVM: pfncache: Add a map helper function
  KVM: pfncache: remove unnecessary exports
  KVM: xen: mark guest pages dirty with the pfncache lock held
  KVM: pfncache: add a mark-dirty helper
  KVM: pfncache: remove KVM_GUEST_USES_PFN usage
  KVM: pfncache: stop open-coding offset_in_page()
  KVM: pfncache: include page offset in uhva and use it consistently
  KVM: pfncache: allow a cache to be activated with a fixed (userspace)
    HVA
  KVM: xen: separate initialization of shared_info cache and content
  KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
  KVM: xen: allow shared_info to be mapped by fixed HVA
  KVM: xen: allow vcpu_info to be mapped by fixed HVA
  KVM: selftests / xen: map shared_info using HVA rather than GFN
  KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA
  KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
  KVM: xen: split up kvm_xen_set_evtchn_fast()
  KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
  KVM: pfncache: check the need for invalidation under read lock first
  KVM: xen: allow vcpu_info content to be 'safely' copied

 Documentation/virt/kvm/api.rst                |  53 ++-
 arch/x86/kvm/x86.c                            |   7 +-
 arch/x86/kvm/xen.c                            | 356 +++++++++++-------
 include/linux/kvm_host.h                      |  40 +-
 include/linux/kvm_types.h                     |   8 -
 include/uapi/linux/kvm.h                      |   9 +-
 .../selftests/kvm/x86_64/xen_shinfo_test.c    |  59 ++-
 virt/kvm/pfncache.c                           | 340 +++++++++--------
 8 files changed, 535 insertions(+), 337 deletions(-)


base-commit: 1c6d984f523f67ecfad1083bb04c55d91977bb15
-- 
2.39.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v12 01/20] KVM: pfncache: Add a map helper function
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 02/20] KVM: pfncache: remove unnecessary exports Paul Durrant
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

There is a pfncache unmap helper but mapping is open-coded. Arguably this
is fine because mapping is done in only one place, hva_to_pfn_retry(), but
adding the helper does make that function more readable.

No functional change intended.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>

v8:
 - Re-work commit comment.
 - Fix CONFIG_HAS_IOMEM=n build.
---
 virt/kvm/pfncache.c | 47 ++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 2d6aba677830..10842f1eeeae 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -96,17 +96,32 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_gpc_check);
 
-static void gpc_unmap_khva(kvm_pfn_t pfn, void *khva)
+static void *gpc_map(kvm_pfn_t pfn)
 {
-	/* Unmap the old pfn/page if it was mapped before. */
-	if (!is_error_noslot_pfn(pfn) && khva) {
-		if (pfn_valid(pfn))
-			kunmap(pfn_to_page(pfn));
+	if (pfn_valid(pfn))
+		return kmap(pfn_to_page(pfn));
+
 #ifdef CONFIG_HAS_IOMEM
-		else
-			memunmap(khva);
+	return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
+#else
+	return NULL;
 #endif
+}
+
+static void gpc_unmap(kvm_pfn_t pfn, void *khva)
+{
+	/* Unmap the old pfn/page if it was mapped before. */
+	if (is_error_noslot_pfn(pfn) || !khva)
+		return;
+
+	if (pfn_valid(pfn)) {
+		kunmap(pfn_to_page(pfn));
+		return;
 	}
+
+#ifdef CONFIG_HAS_IOMEM
+	memunmap(khva);
+#endif
 }
 
 static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
@@ -175,7 +190,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 			 * the existing mapping and didn't create a new one.
 			 */
 			if (new_khva != old_khva)
-				gpc_unmap_khva(new_pfn, new_khva);
+				gpc_unmap(new_pfn, new_khva);
 
 			kvm_release_pfn_clean(new_pfn);
 
@@ -193,15 +208,11 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 		 * too must be done outside of gpc->lock!
 		 */
 		if (gpc->usage & KVM_HOST_USES_PFN) {
-			if (new_pfn == gpc->pfn) {
+			if (new_pfn == gpc->pfn)
 				new_khva = old_khva;
-			} else if (pfn_valid(new_pfn)) {
-				new_khva = kmap(pfn_to_page(new_pfn));
-#ifdef CONFIG_HAS_IOMEM
-			} else {
-				new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
-			}
+			else
+				new_khva = gpc_map(new_pfn);
+
 			if (!new_khva) {
 				kvm_release_pfn_clean(new_pfn);
 				goto out_error;
@@ -326,7 +337,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 	mutex_unlock(&gpc->refresh_lock);
 
 	if (unmap_old)
-		gpc_unmap_khva(old_pfn, old_khva);
+		gpc_unmap(old_pfn, old_khva);
 
 	return ret;
 }
@@ -412,7 +423,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
 		list_del(&gpc->list);
 		spin_unlock(&kvm->gpc_lock);
 
-		gpc_unmap_khva(old_pfn, old_khva);
+		gpc_unmap(old_pfn, old_khva);
 	}
 }
 EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 02/20] KVM: pfncache: remove unnecessary exports
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 01/20] KVM: pfncache: Add a map helper function Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held Paul Durrant
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

There is no need for the existing kvm_gpc_XXX() functions to be exported.
Clean up now before additional functions are added in subsequent patches.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>

v8:
 - New in this version.
---
 virt/kvm/pfncache.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 10842f1eeeae..f3571f44d9af 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -94,7 +94,6 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 
 	return true;
 }
-EXPORT_SYMBOL_GPL(kvm_gpc_check);
 
 static void *gpc_map(kvm_pfn_t pfn)
 {
@@ -346,7 +345,6 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
 {
 	return __kvm_gpc_refresh(gpc, gpc->gpa, len);
 }
-EXPORT_SYMBOL_GPL(kvm_gpc_refresh);
 
 void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
 		  struct kvm_vcpu *vcpu, enum pfn_cache_usage usage)
@@ -363,7 +361,6 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
 	gpc->pfn = KVM_PFN_ERR_FAULT;
 	gpc->uhva = KVM_HVA_ERR_BAD;
 }
-EXPORT_SYMBOL_GPL(kvm_gpc_init);
 
 int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
 {
@@ -388,7 +385,6 @@ int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
 	}
 	return __kvm_gpc_refresh(gpc, gpa, len);
 }
-EXPORT_SYMBOL_GPL(kvm_gpc_activate);
 
 void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
 {
@@ -426,4 +422,3 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
 		gpc_unmap(old_pfn, old_khva);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 01/20] KVM: pfncache: Add a map helper function Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 02/20] KVM: pfncache: remove unnecessary exports Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-02-07  3:17   ` Sean Christopherson
  2024-01-15 12:56 ` [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper Paul Durrant
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Sampling gpa and memslot from an unlocked pfncache may yield inconsistent
values so, since there is no problem with calling mark_page_dirty_in_slot()
with the pfncache lock held, relocate the calls in
kvm_xen_update_runstate_guest() and kvm_xen_inject_pending_events()
accordingly.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v8:
 - New in this version.
---
 arch/x86/kvm/xen.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e43948b87f94..b63bf54bb376 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -452,14 +452,13 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
 		smp_wmb();
 	}
 
-	if (user_len2)
+	if (user_len2) {
+		mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
 		read_unlock(&gpc2->lock);
-
-	read_unlock_irqrestore(&gpc1->lock, flags);
+	}
 
 	mark_page_dirty_in_slot(v->kvm, gpc1->memslot, gpc1->gpa >> PAGE_SHIFT);
-	if (user_len2)
-		mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
+	read_unlock_irqrestore(&gpc1->lock, flags);
 }
 
 void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
@@ -565,13 +564,13 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v)
 			     : "0" (evtchn_pending_sel32));
 		WRITE_ONCE(vi->evtchn_upcall_pending, 1);
 	}
+
+	mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
 	/* For the per-vCPU lapic vector, deliver it as MSI. */
 	if (v->arch.xen.upcall_vector)
 		kvm_xen_inject_vcpu_vector(v);
-
-	mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
 }
 
 int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (2 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-02-07  3:20   ` Sean Christopherson
  2024-02-09 15:58   ` Sean Christopherson
  2024-01-15 12:56 ` [PATCH v12 05/20] KVM: pfncache: remove KVM_GUEST_USES_PFN usage Paul Durrant
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

At the moment pages are marked dirty by open-coded calls to
mark_page_dirty_in_slot(), directly deferefencing the gpa and memslot
from the cache. After a subsequent patch these may not always be set
so add a helper now so that caller will protected from the need to know
about this detail.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

v8:
 - Make the helper a static inline.
---
 arch/x86/kvm/x86.c       |  2 +-
 arch/x86/kvm/xen.c       |  6 +++---
 include/linux/kvm_host.h | 11 +++++++++++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 27e23714e960..0a0ac91a494f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3156,7 +3156,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,
 
 	guest_hv_clock->version = ++vcpu->hv_clock.version;
 
-	mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+	kvm_gpc_mark_dirty(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
 	trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index b63bf54bb376..34c48d0029c1 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -453,11 +453,11 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
 	}
 
 	if (user_len2) {
-		mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
+		kvm_gpc_mark_dirty(gpc2);
 		read_unlock(&gpc2->lock);
 	}
 
-	mark_page_dirty_in_slot(v->kvm, gpc1->memslot, gpc1->gpa >> PAGE_SHIFT);
+	kvm_gpc_mark_dirty(gpc1);
 	read_unlock_irqrestore(&gpc1->lock, flags);
 }
 
@@ -565,7 +565,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v)
 		WRITE_ONCE(vi->evtchn_upcall_pending, 1);
 	}
 
-	mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+	kvm_gpc_mark_dirty(gpc);
 	read_unlock_irqrestore(&gpc->lock, flags);
 
 	/* For the per-vCPU lapic vector, deliver it as MSI. */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..f3bb9e0a81fe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1399,6 +1399,17 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
  */
 void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
 
+/**
+ * kvm_gpc_mark_dirty - mark a cached page as dirty.
+ *
+ * @gpc:	   struct gfn_to_pfn_cache object.
+ */
+static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
+{
+	lockdep_assert_held(&gpc->lock);
+	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+}
+
 void kvm_sigset_activate(struct kvm_vcpu *vcpu);
 void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 05/20] KVM: pfncache: remove KVM_GUEST_USES_PFN usage
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (3 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 06/20] KVM: pfncache: stop open-coding offset_in_page() Paul Durrant
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
callers of kvm_gpc_init(), which also makes the 'vcpu' argument redundant.
Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
check in hva_to_pfn_retry() and hence the 'usage' argument to
kvm_gpc_init() are also redundant.
Remove the pfn_cache_usage enumeration and remove the redundant arguments,
fields of struct gfn_to_hva_cache, and all the related code.

[1] https://lore.kernel.org/all/ZQiR8IpqOZrOpzHC@google.com/

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v8:
 - New in this version.
---
 arch/x86/kvm/x86.c        |  2 +-
 arch/x86/kvm/xen.c        | 14 ++++-----
 include/linux/kvm_host.h  | 11 +------
 include/linux/kvm_types.h |  8 -----
 virt/kvm/pfncache.c       | 61 ++++++---------------------------------
 5 files changed, 16 insertions(+), 80 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a0ac91a494f..19e6cc1dadfe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12049,7 +12049,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs_avail = ~0;
 	vcpu->arch.regs_dirty = ~0;
 
-	kvm_gpc_init(&vcpu->arch.pv_time, vcpu->kvm, vcpu, KVM_HOST_USES_PFN);
+	kvm_gpc_init(&vcpu->arch.pv_time, vcpu->kvm);
 
 	if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu))
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 34c48d0029c1..f9b1e494c430 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -2108,14 +2108,10 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu)
 
 	timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0);
 
-	kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm, NULL,
-		     KVM_HOST_USES_PFN);
-	kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm, NULL,
-		     KVM_HOST_USES_PFN);
-	kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache, vcpu->kvm, NULL,
-		     KVM_HOST_USES_PFN);
-	kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache, vcpu->kvm, NULL,
-		     KVM_HOST_USES_PFN);
+	kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm);
+	kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm);
+	kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache, vcpu->kvm);
+	kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache, vcpu->kvm);
 }
 
 void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
@@ -2158,7 +2154,7 @@ void kvm_xen_init_vm(struct kvm *kvm)
 {
 	mutex_init(&kvm->arch.xen.xen_lock);
 	idr_init(&kvm->arch.xen.evtchn_ports);
-	kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm, NULL, KVM_HOST_USES_PFN);
+	kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm);
 }
 
 void kvm_xen_destroy_vm(struct kvm *kvm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3bb9e0a81fe..f2354f808d04 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1319,21 +1319,12 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
  *
  * @gpc:	   struct gfn_to_pfn_cache object.
  * @kvm:	   pointer to kvm instance.
- * @vcpu:	   vCPU to be used for marking pages dirty and to be woken on
- *		   invalidation.
- * @usage:	   indicates if the resulting host physical PFN is used while
- *		   the @vcpu is IN_GUEST_MODE (in which case invalidation of 
- *		   the cache from MMU notifiers---but not for KVM memslot
- *		   changes!---will also force @vcpu to exit the guest and
- *		   refresh the cache); and/or if the PFN used directly
- *		   by KVM (and thus needs a kernel virtual mapping).
  *
  * This sets up a gfn_to_pfn_cache by initializing locks and assigning the
  * immutable attributes.  Note, the cache must be zero-allocated (or zeroed by
  * the caller before init).
  */
-void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
-		  struct kvm_vcpu *vcpu, enum pfn_cache_usage usage);
+void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm);
 
 /**
  * kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 9d1f7835d8c1..d93f6522b2c3 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -49,12 +49,6 @@ typedef u64            hfn_t;
 
 typedef hfn_t kvm_pfn_t;
 
-enum pfn_cache_usage {
-	KVM_GUEST_USES_PFN = BIT(0),
-	KVM_HOST_USES_PFN  = BIT(1),
-	KVM_GUEST_AND_HOST_USE_PFN = KVM_GUEST_USES_PFN | KVM_HOST_USES_PFN,
-};
-
 struct gfn_to_hva_cache {
 	u64 generation;
 	gpa_t gpa;
@@ -69,13 +63,11 @@ struct gfn_to_pfn_cache {
 	unsigned long uhva;
 	struct kvm_memory_slot *memslot;
 	struct kvm *kvm;
-	struct kvm_vcpu *vcpu;
 	struct list_head list;
 	rwlock_t lock;
 	struct mutex refresh_lock;
 	void *khva;
 	kvm_pfn_t pfn;
-	enum pfn_cache_usage usage;
 	bool active;
 	bool valid;
 };
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index f3571f44d9af..6f4b537eb25b 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -25,9 +25,7 @@
 void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 				       unsigned long end, bool may_block)
 {
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
 	struct gfn_to_pfn_cache *gpc;
-	bool evict_vcpus = false;
 
 	spin_lock(&kvm->gpc_lock);
 	list_for_each_entry(gpc, &kvm->gpc_list, list) {
@@ -37,43 +35,10 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 		if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
 		    gpc->uhva >= start && gpc->uhva < end) {
 			gpc->valid = false;
-
-			/*
-			 * If a guest vCPU could be using the physical address,
-			 * it needs to be forced out of guest mode.
-			 */
-			if (gpc->usage & KVM_GUEST_USES_PFN) {
-				if (!evict_vcpus) {
-					evict_vcpus = true;
-					bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
-				}
-				__set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap);
-			}
 		}
 		write_unlock_irq(&gpc->lock);
 	}
 	spin_unlock(&kvm->gpc_lock);
-
-	if (evict_vcpus) {
-		/*
-		 * KVM needs to ensure the vCPU is fully out of guest context
-		 * before allowing the invalidation to continue.
-		 */
-		unsigned int req = KVM_REQ_OUTSIDE_GUEST_MODE;
-		bool called;
-
-		/*
-		 * If the OOM reaper is active, then all vCPUs should have
-		 * been stopped already, so perform the request without
-		 * KVM_REQUEST_WAIT and be sad if any needed to be IPI'd.
-		 */
-		if (!may_block)
-			req &= ~KVM_REQUEST_WAIT;
-
-		called = kvm_make_vcpus_request_mask(kvm, req, vcpu_bitmap);
-
-		WARN_ON_ONCE(called && !may_block);
-	}
 }
 
 bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
@@ -206,16 +171,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 		 * pfn.  Note, kmap() and memremap() can both sleep, so this
 		 * too must be done outside of gpc->lock!
 		 */
-		if (gpc->usage & KVM_HOST_USES_PFN) {
-			if (new_pfn == gpc->pfn)
-				new_khva = old_khva;
-			else
-				new_khva = gpc_map(new_pfn);
-
-			if (!new_khva) {
-				kvm_release_pfn_clean(new_pfn);
-				goto out_error;
-			}
+		if (new_pfn == gpc->pfn)
+			new_khva = old_khva;
+		else
+			new_khva = gpc_map(new_pfn);
+
+		if (!new_khva) {
+			kvm_release_pfn_clean(new_pfn);
+			goto out_error;
 		}
 
 		write_lock_irq(&gpc->lock);
@@ -346,18 +309,12 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
 	return __kvm_gpc_refresh(gpc, gpc->gpa, len);
 }
 
-void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
-		  struct kvm_vcpu *vcpu, enum pfn_cache_usage usage)
+void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
 {
-	WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage);
-	WARN_ON_ONCE((usage & KVM_GUEST_USES_PFN) && !vcpu);
-
 	rwlock_init(&gpc->lock);
 	mutex_init(&gpc->refresh_lock);
 
 	gpc->kvm = kvm;
-	gpc->vcpu = vcpu;
-	gpc->usage = usage;
 	gpc->pfn = KVM_PFN_ERR_FAULT;
 	gpc->uhva = KVM_HVA_ERR_BAD;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 06/20] KVM: pfncache: stop open-coding offset_in_page()
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (4 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 05/20] KVM: pfncache: remove KVM_GUEST_USES_PFN usage Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 07/20] KVM: pfncache: include page offset in uhva and use it consistently Paul Durrant
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Some code in pfncache uses offset_in_page() but in other places it is open-
coded. Use offset_in_page() consistently everywhere.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v8:
 - New in this version.
---
 virt/kvm/pfncache.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 6f4b537eb25b..0eeb034d0674 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,7 +48,7 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 	if (!gpc->active)
 		return false;
 
-	if ((gpc->gpa & ~PAGE_MASK) + len > PAGE_SIZE)
+	if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
 		return false;
 
 	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 
 	gpc->valid = true;
 	gpc->pfn = new_pfn;
-	gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+	gpc->khva = new_khva + offset_in_page(gpc->gpa);
 
 	/*
 	 * Put the reference to the _new_ pfn.  The pfn is now tracked by the
@@ -213,7 +213,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 			     unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
-	unsigned long page_offset = gpa & ~PAGE_MASK;
+	unsigned long page_offset = offset_in_page(gpa);
 	bool unmap_old = false;
 	unsigned long old_uhva;
 	kvm_pfn_t old_pfn;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 07/20] KVM: pfncache: include page offset in uhva and use it consistently
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (5 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 06/20] KVM: pfncache: stop open-coding offset_in_page() Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA Paul Durrant
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Currently the pfncache page offset is sometimes determined using the gpa
and sometimes the khva, whilst the uhva is always page-aligned. After a
subsequent patch is applied the gpa will not always be valid so adjust
the code to include the page offset in the uhva and use it consistently
as the source of truth.

Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
for clarity.

No functional change intended.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v8:
 - New in this version.
---
 virt/kvm/pfncache.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 0eeb034d0674..97eec8ee3449 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,10 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 	if (!gpc->active)
 		return false;
 
-	if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
+	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
 		return false;
 
-	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
+	if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
 		return false;
 
 	if (!gpc->valid)
@@ -119,7 +119,7 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
 static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 {
 	/* Note, the new page offset may be different than the old! */
-	void *old_khva = gpc->khva - offset_in_page(gpc->khva);
+	void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
 	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
 	void *new_khva = NULL;
 	unsigned long mmu_seq;
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 
 	gpc->valid = true;
 	gpc->pfn = new_pfn;
-	gpc->khva = new_khva + offset_in_page(gpc->gpa);
+	gpc->khva = new_khva + offset_in_page(gpc->uhva);
 
 	/*
 	 * Put the reference to the _new_ pfn.  The pfn is now tracked by the
@@ -217,6 +217,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 	bool unmap_old = false;
 	unsigned long old_uhva;
 	kvm_pfn_t old_pfn;
+	bool hva_change = false;
 	void *old_khva;
 	int ret;
 
@@ -242,10 +243,10 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 	}
 
 	old_pfn = gpc->pfn;
-	old_khva = gpc->khva - offset_in_page(gpc->khva);
-	old_uhva = gpc->uhva;
+	old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
+	old_uhva = PAGE_ALIGN_DOWN(gpc->uhva);
 
-	/* If the userspace HVA is invalid, refresh that first */
+	/* Refresh the userspace HVA if necessary */
 	if (gpc->gpa != gpa || gpc->generation != slots->generation ||
 	    kvm_is_error_hva(gpc->uhva)) {
 		gfn_t gfn = gpa_to_gfn(gpa);
@@ -259,13 +260,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 			ret = -EFAULT;
 			goto out;
 		}
+
+		/*
+		 * Even if the GPA and/or the memslot generation changed, the
+		 * HVA may still be the same.
+		 */
+		if (gpc->uhva != old_uhva)
+			hva_change = true;
+	} else {
+		gpc->uhva = old_uhva;
 	}
 
+	/* Note: the offset must be correct before calling hva_to_pfn_retry() */
+	gpc->uhva += page_offset;
+
 	/*
 	 * If the userspace HVA changed or the PFN was already invalid,
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
-	if (!gpc->valid || old_uhva != gpc->uhva) {
+	if (!gpc->valid || hva_change) {
 		ret = hva_to_pfn_retry(gpc);
 	} else {
 		/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (6 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 07/20] KVM: pfncache: include page offset in uhva and use it consistently Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-02-07  4:03   ` Sean Christopherson
  2024-01-15 12:56 ` [PATCH v12 09/20] KVM: xen: separate initialization of shared_info cache and content Paul Durrant
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Some pfncache pages may actually be overlays on guest memory that have a
fixed HVA within the VMM. It's pointless to invalidate such cached
mappings if the overlay is moved so allow a cache to be activated directly
with the HVA to cater for such cases. A subsequent patch will make use
of this facility.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v11:
 - Fixed kvm_gpc_check() to ignore memslot generation if the cache is not
   activated with a GPA. (This breakage occured during the re-work for v8).

v9:
 - Pass both GPA and HVA into __kvm_gpc_refresh() rather than overloading
   the address paraneter and using a bool flag to indicated what it is.

v8:
 - Re-worked to avoid messing with struct gfn_to_pfn_cache.
---
 include/linux/kvm_host.h | 20 +++++++++++++++++++-
 virt/kvm/pfncache.c      | 40 +++++++++++++++++++++++++++++++---------
 2 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f2354f808d04..7994c4d16783 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1344,6 +1344,22 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm);
  */
 int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len);
 
+/**
+ * kvm_gpc_activate_hva - prepare a cached kernel mapping and HPA for a given HVA.
+ *
+ * @gpc:          struct gfn_to_pfn_cache object.
+ * @hva:          userspace virtual address to map.
+ * @len:          sanity check; the range being access must fit a single page.
+ *
+ * @return:       0 for success.
+ *                -EINVAL for a mapping which would cross a page boundary.
+ *                -EFAULT for an untranslatable guest physical address.
+ *
+ * The semantics of this function are the same as those of kvm_gpc_activate(). It
+ * merely bypasses a layer of address translation.
+ */
+int kvm_gpc_activate_hva(struct gfn_to_pfn_cache *gpc, unsigned long hva, unsigned long len);
+
 /**
  * kvm_gpc_check - check validity of a gfn_to_pfn_cache.
  *
@@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
 static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
 {
 	lockdep_assert_held(&gpc->lock);
-	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+
+	if (gpc->gpa != KVM_XEN_INVALID_GPA)
+		mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
 }
 
 void kvm_sigset_activate(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 97eec8ee3449..ae822bff812f 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,7 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
 	if (!gpc->active)
 		return false;
 
-	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
+	if (gpc->gpa != KVM_XEN_INVALID_GPA && gpc->generation != slots->generation)
+		return false;
+
+	if (kvm_is_error_hva(gpc->uhva))
 		return false;
 
 	if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
@@ -209,11 +212,13 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
 	return -EFAULT;
 }
 
-static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
+static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
 			     unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
-	unsigned long page_offset = offset_in_page(gpa);
+	unsigned long page_offset = (gpa != KVM_XEN_INVALID_GPA) ?
+		offset_in_page(gpa) :
+		offset_in_page(uhva);
 	bool unmap_old = false;
 	unsigned long old_uhva;
 	kvm_pfn_t old_pfn;
@@ -246,9 +251,15 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 	old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
 	old_uhva = PAGE_ALIGN_DOWN(gpc->uhva);
 
-	/* Refresh the userspace HVA if necessary */
-	if (gpc->gpa != gpa || gpc->generation != slots->generation ||
-	    kvm_is_error_hva(gpc->uhva)) {
+	if (gpa == KVM_XEN_INVALID_GPA) {
+		gpc->gpa = KVM_XEN_INVALID_GPA;
+		gpc->uhva = PAGE_ALIGN_DOWN(uhva);
+
+		if (gpc->uhva != old_uhva)
+			hva_change = true;
+	} else if (gpc->gpa != gpa ||
+		   gpc->generation != slots->generation ||
+		   kvm_is_error_hva(gpc->uhva)) {
 		gfn_t gfn = gpa_to_gfn(gpa);
 
 		gpc->gpa = gpa;
@@ -319,7 +330,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
 
 int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
 {
-	return __kvm_gpc_refresh(gpc, gpc->gpa, len);
+	return __kvm_gpc_refresh(gpc, gpc->gpa, gpc->uhva, len);
 }
 
 void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
@@ -332,7 +343,8 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
 	gpc->uhva = KVM_HVA_ERR_BAD;
 }
 
-int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
+static int __kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
+			      unsigned long len)
 {
 	struct kvm *kvm = gpc->kvm;
 
@@ -353,7 +365,17 @@ int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
 		gpc->active = true;
 		write_unlock_irq(&gpc->lock);
 	}
-	return __kvm_gpc_refresh(gpc, gpa, len);
+	return __kvm_gpc_refresh(gpc, gpa, uhva, len);
+}
+
+int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
+{
+	return __kvm_gpc_activate(gpc, gpa, KVM_HVA_ERR_BAD, len);
+}
+
+int kvm_gpc_activate_hva(struct gfn_to_pfn_cache *gpc, unsigned long uhva, unsigned long len)
+{
+	return __kvm_gpc_activate(gpc, KVM_XEN_INVALID_GPA, uhva, len);
 }
 
 void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 09/20] KVM: xen: separate initialization of shared_info cache and content
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (7 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 10/20] KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set Paul Durrant
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

A subsequent patch will allow shared_info to be initialized using either a
GPA or a user-space (i.e. VMM) HVA. To make that patch cleaner, separate
the initialization of the shared_info content from the activation of the
pfncache.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v11:
 - Fix accidental regression from commit 5d6d6a7d7e66a ("KVM: x86: Refine
   calculation of guest wall clock to use a single TSC read").

v10:
 - New in this version.
---
 arch/x86/kvm/xen.c | 55 +++++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index f9b1e494c430..df53fea73747 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -34,41 +34,32 @@ static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r);
 
 DEFINE_STATIC_KEY_DEFERRED_FALSE(kvm_xen_enabled, HZ);
 
-static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+static int kvm_xen_shared_info_init(struct kvm *kvm)
 {
 	struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
 	struct pvclock_wall_clock *wc;
-	gpa_t gpa = gfn_to_gpa(gfn);
 	u32 *wc_sec_hi;
 	u32 wc_version;
 	u64 wall_nsec;
 	int ret = 0;
 	int idx = srcu_read_lock(&kvm->srcu);
 
-	if (gfn == KVM_XEN_INVALID_GFN) {
-		kvm_gpc_deactivate(gpc);
-		goto out;
-	}
+	read_lock_irq(&gpc->lock);
+	while (!kvm_gpc_check(gpc, PAGE_SIZE)) {
+		read_unlock_irq(&gpc->lock);
 
-	do {
-		ret = kvm_gpc_activate(gpc, gpa, PAGE_SIZE);
+		ret = kvm_gpc_refresh(gpc, PAGE_SIZE);
 		if (ret)
 			goto out;
 
-		/*
-		 * This code mirrors kvm_write_wall_clock() except that it writes
-		 * directly through the pfn cache and doesn't mark the page dirty.
-		 */
-		wall_nsec = kvm_get_wall_clock_epoch(kvm);
-
-		/* It could be invalid again already, so we need to check */
 		read_lock_irq(&gpc->lock);
+	}
 
-		if (gpc->valid)
-			break;
-
-		read_unlock_irq(&gpc->lock);
-	} while (1);
+	/*
+	 * This code mirrors kvm_write_wall_clock() except that it writes
+	 * directly through the pfn cache and doesn't mark the page dirty.
+	 */
+	wall_nsec = kvm_get_wall_clock_epoch(kvm);
 
 	/* Paranoia checks on the 32-bit struct layout */
 	BUILD_BUG_ON(offsetof(struct compat_shared_info, wc) != 0x900);
@@ -639,12 +630,30 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
-	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
+		int idx;
+
 		mutex_lock(&kvm->arch.xen.xen_lock);
-		r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
+
+		idx = srcu_read_lock(&kvm->srcu);
+
+		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
+			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+			r = 0;
+		} else {
+			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
+					     gfn_to_gpa(data->u.shared_info.gfn),
+					     PAGE_SIZE);
+		}
+
+		srcu_read_unlock(&kvm->srcu, idx);
+
+		if (!r && kvm->arch.xen.shinfo_cache.active)
+			r = kvm_xen_shared_info_init(kvm);
+
 		mutex_unlock(&kvm->arch.xen.xen_lock);
 		break;
-
+	}
 	case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
 		if (data->u.vector && data->u.vector < 0x10)
 			r = -EINVAL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 10/20] KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (8 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 09/20] KVM: xen: separate initialization of shared_info cache and content Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:56 ` [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA Paul Durrant
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

If the shared_info PFN cache has already been initialized then the content
of the shared_info page needs to be re-initialized whenever the guest
mode is (re)set.
Setting the guest mode is either done explicitly by the VMM via the
KVM_XEN_ATTR_TYPE_LONG_MODE attribute, or implicitly when the guest writes
the MSR to set up the hypercall page.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v12:
 - Fix missing update of return value if mode is not actually changed.

v11:
 - Drop the hunk removing the call to kvm_xen_shared_info_init() when
   KVM_XEN_ATTR_TYPE_SHARED_INFO is set; it was a mistake and causes self-
   test failures.

v10:
 - New in this version.
---
 arch/x86/kvm/xen.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index df53fea73747..d595d476a5b3 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -625,8 +625,16 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		} else {
 			mutex_lock(&kvm->arch.xen.xen_lock);
 			kvm->arch.xen.long_mode = !!data->u.long_mode;
+
+			/*
+			 * Re-initialize shared_info to put the wallclock in the
+			 * correct place. Whilst it's not necessary to do this
+			 * unless the mode is actually changed, it does no harm
+			 * to make the call anyway.
+			 */
+			r = kvm->arch.xen.shinfo_cache.active ?
+				kvm_xen_shared_info_init(kvm) : 0;
 			mutex_unlock(&kvm->arch.xen.xen_lock);
-			r = 0;
 		}
 		break;
 
@@ -1101,9 +1109,24 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
 	u32 page_num = data & ~PAGE_MASK;
 	u64 page_addr = data & PAGE_MASK;
 	bool lm = is_long_mode(vcpu);
+	int r = 0;
+
+	mutex_lock(&kvm->arch.xen.xen_lock);
+	if (kvm->arch.xen.long_mode != lm) {
+		kvm->arch.xen.long_mode = lm;
+
+		/*
+		 * Re-initialize shared_info to put the wallclock in the
+		 * correct place.
+		 */
+		if (kvm->arch.xen.shinfo_cache.active &&
+		    kvm_xen_shared_info_init(kvm))
+			r = 1;
+	}
+	mutex_unlock(&kvm->arch.xen.xen_lock);
 
-	/* Latch long_mode for shared_info pages etc. */
-	vcpu->kvm->arch.xen.long_mode = lm;
+	if (r)
+		return r;
 
 	/*
 	 * If Xen hypercall intercept is enabled, fill the hypercall
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (9 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 10/20] KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-02-07  4:10   ` Sean Christopherson
  2024-01-15 12:56 ` [PATCH v12 12/20] KVM: xen: allow vcpu_info " Paul Durrant
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

The shared_info page is not guest memory as such. It is a dedicated page
allocated by the VMM and overlaid onto guest memory in a GFN chosen by the
guest and specified in the XENMEM_add_to_physmap hypercall. The guest may
even request that shared_info be moved from one GFN to another by
re-issuing that hypercall, but the HVA is never going to change.

Because the shared_info page is an overlay the memory slots need to be
updated in response to the hypercall. However, memory slot adjustment is
not atomic and, whilst all vCPUs are paused, there is still the possibility
that events may be delivered (which requires the shared_info page to be
updated) whilst the shared_info GPA is absent. The HVA is never absent
though, so it makes much more sense to use that as the basis for the
kernel's mapping.

Hence add a new KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA attribute type for this
purpose and a KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag to advertize its
availability. Don't actually advertize it yet though. That will be done in
a subsequent patch, which will also add tests for the new attribute type.

Also update the KVM API documentation with the new attribute and also fix
it up to consistently refer to 'shared_info' (with the underscore).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

v8:
 - Re-base.

v2:
 - Define the new attribute and capability but don't advertize the
   capability yet.
 - Add API documentation.
---
 Documentation/virt/kvm/api.rst | 25 ++++++++++++++++------
 arch/x86/kvm/xen.c             | 39 ++++++++++++++++++++++++++--------
 include/uapi/linux/kvm.h       |  6 +++++-
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3ec0b7a455a0..3372be85b335 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -372,7 +372,7 @@ The bits in the dirty bitmap are cleared before the ioctl returns, unless
 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled.  For more information,
 see the description of the capability.
 
-Note that the Xen shared info page, if configured, shall always be assumed
+Note that the Xen shared_info page, if configured, shall always be assumed
 to be dirty. KVM will not explicitly mark it such.
 
 
@@ -5487,8 +5487,9 @@ KVM_PV_ASYNC_CLEANUP_PERFORM
 		__u8 long_mode;
 		__u8 vector;
 		__u8 runstate_update_flag;
-		struct {
+		union {
 			__u64 gfn;
+			__u64 hva;
 		} shared_info;
 		struct {
 			__u32 send_port;
@@ -5516,10 +5517,10 @@ type values:
 
 KVM_XEN_ATTR_TYPE_LONG_MODE
   Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
-  determines the layout of the shared info pages exposed to the VM.
+  determines the layout of the shared_info page exposed to the VM.
 
 KVM_XEN_ATTR_TYPE_SHARED_INFO
-  Sets the guest physical frame number at which the Xen "shared info"
+  Sets the guest physical frame number at which the Xen shared_info
   page resides. Note that although Xen places vcpu_info for the first
   32 vCPUs in the shared_info page, KVM does not automatically do so
   and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
@@ -5528,7 +5529,7 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
   not be aware of the Xen CPU id which is used as the index into the
   vcpu_info[] array, so may know the correct default location.
 
-  Note that the shared info page may be constantly written to by KVM;
+  Note that the shared_info page may be constantly written to by KVM;
   it contains the event channel bitmap used to deliver interrupts to
   a Xen guest, amongst other things. It is exempt from dirty tracking
   mechanisms — KVM will not explicitly mark the page as dirty each
@@ -5537,9 +5538,21 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
   any vCPU has been running or any event channel interrupts can be
   routed to the guest.
 
-  Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info
+  Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared_info
   page.
 
+KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA
+  If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+  Xen capabilities, then this attribute may be used to set the
+  userspace address at which the shared_info page resides, which
+  will always be fixed in the VMM regardless of where it is mapped
+  in guest physical address space. This attribute should be used in
+  preference to KVM_XEN_ATTR_TYPE_SHARED_INFO as it avoids
+  unnecessary invalidation of an internal cache when the page is
+  re-mapped in guest physcial address space.
+
+  Setting the hva to zero will disable the shared_info page.
+
 KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
   Sets the exception vector used to deliver Xen event channel upcalls.
   This is the HVM-wide vector injected directly by the hypervisor
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index d595d476a5b3..813d63a8703a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -617,7 +617,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 {
 	int r = -ENOENT;
 
-
 	switch (data->type) {
 	case KVM_XEN_ATTR_TYPE_LONG_MODE:
 		if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) {
@@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		}
 		break;
 
-	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
 		int idx;
 
 		mutex_lock(&kvm->arch.xen.xen_lock);
 
 		idx = srcu_read_lock(&kvm->srcu);
 
-		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
-			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
-			r = 0;
+		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
+			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
+				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+				r = 0;
+			} else {
+				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
+						     gfn_to_gpa(data->u.shared_info.gfn),
+						     PAGE_SIZE);
+			}
 		} else {
-			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
-					     gfn_to_gpa(data->u.shared_info.gfn),
-					     PAGE_SIZE);
+			if (data->u.shared_info.hva == 0) {
+				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+				r = 0;
+			} else {
+				r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache,
+							 data->u.shared_info.hva,
+							 PAGE_SIZE);
+			}
 		}
 
 		srcu_read_unlock(&kvm->srcu, idx);
@@ -715,13 +726,23 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
 		break;
 
 	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
-		if (kvm->arch.xen.shinfo_cache.active)
+		if (kvm->arch.xen.shinfo_cache.active &&
+		    kvm->arch.xen.shinfo_cache.gpa != KVM_XEN_INVALID_GPA)
 			data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
 		else
 			data->u.shared_info.gfn = KVM_XEN_INVALID_GFN;
 		r = 0;
 		break;
 
+	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA:
+		if (kvm->arch.xen.shinfo_cache.active &&
+		    kvm->arch.xen.shinfo_cache.gpa == KVM_XEN_INVALID_GPA)
+			data->u.shared_info.hva = kvm->arch.xen.shinfo_cache.uhva;
+		else
+			data->u.shared_info.hva = 0;
+		r = 0;
+		break;
+
 	case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
 		data->u.vector = kvm->arch.xen.upcall_vector;
 		r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..ac5caba313d1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1246,6 +1246,7 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
 #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
 #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA	(1 << 8)
 
 struct kvm_xen_hvm_config {
 	__u32 flags;
@@ -1744,9 +1745,10 @@ struct kvm_xen_hvm_attr {
 		__u8 long_mode;
 		__u8 vector;
 		__u8 runstate_update_flag;
-		struct {
+		union {
 			__u64 gfn;
 #define KVM_XEN_INVALID_GFN ((__u64)-1)
+			__u64 hva;
 		} shared_info;
 		struct {
 			__u32 send_port;
@@ -1788,6 +1790,8 @@ struct kvm_xen_hvm_attr {
 #define KVM_XEN_ATTR_TYPE_XEN_VERSION		0x4
 /* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
 #define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG	0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA	0x6
 
 /* Per-vCPU Xen attributes */
 #define KVM_XEN_VCPU_GET_ATTR	_IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 12/20] KVM: xen: allow vcpu_info to be mapped by fixed HVA
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (10 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA Paul Durrant
@ 2024-01-15 12:56 ` Paul Durrant
  2024-01-15 12:57 ` [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN Paul Durrant
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:56 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

If the guest does not explicitly set the GPA of vcpu_info structure in
memory then, for guests with 32 vCPUs or fewer, the vcpu_info embedded
in the shared_info page may be used. As described in a previous commit,
the shared_info page is an overlay at a fixed HVA within the VMM, so in
this case it also more optimal to activate the vcpu_info cache with a
fixed HVA to avoid unnecessary invalidation if the guest memory layout
is modified.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

v8:
 - Re-base.

v5:
 - New in this version.
---
 Documentation/virt/kvm/api.rst | 26 +++++++++++++++++++++-----
 arch/x86/kvm/xen.c             | 34 ++++++++++++++++++++++++++++------
 include/uapi/linux/kvm.h       |  3 +++
 3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3372be85b335..bd93cafd3e4e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5523,11 +5523,12 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
   Sets the guest physical frame number at which the Xen shared_info
   page resides. Note that although Xen places vcpu_info for the first
   32 vCPUs in the shared_info page, KVM does not automatically do so
-  and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
-  explicitly even when the vcpu_info for a given vCPU resides at the
-  "default" location in the shared_info page. This is because KVM may
-  not be aware of the Xen CPU id which is used as the index into the
-  vcpu_info[] array, so may know the correct default location.
+  and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or
+  KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA be used explicitly even when
+  the vcpu_info for a given vCPU resides at the "default" location
+  in the shared_info page. This is because KVM may not be aware of
+  the Xen CPU id which is used as the index into the vcpu_info[]
+  array, so may know the correct default location.
 
   Note that the shared_info page may be constantly written to by KVM;
   it contains the event channel bitmap used to deliver interrupts to
@@ -5649,6 +5650,21 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
   on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
   the vcpu_info.
 
+KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA
+  If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+  Xen capabilities, then this attribute may be used to set the
+  userspace address of the vcpu_info for a given vCPU. It should
+  only be used when the vcpu_info resides at the "default" location
+  in the shared_info page. In this case it is safe to assume the
+  userspace address will not change, because the shared_info page is
+  an overlay on guest memory and remains at a fixed host address
+  regardless of where it is mapped in guest physical address space
+  and hence unnecessary invalidation of an internal cache may be
+  avoided if the guest memory layout is modified.
+  If the vcpu_info does not reside at the "default" location then
+  it is not guaranteed to remain at the same host address and
+  hence the aforementioned cache invalidation is required.
+
 KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
   Sets the guest physical address of an additional pvclock structure
   for a given vCPU. This is typically used for guest vsyscall support.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 813d63a8703a..c99008d217f4 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -779,20 +779,33 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 
 	switch (data->type) {
 	case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO:
+	case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
 		/* No compat necessary here. */
 		BUILD_BUG_ON(sizeof(struct vcpu_info) !=
 			     sizeof(struct compat_vcpu_info));
 		BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
 			     offsetof(struct compat_vcpu_info, time));
 
-		if (data->u.gpa == KVM_XEN_INVALID_GPA) {
-			kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
-			r = 0;
-			break;
+		if (data->type == KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO) {
+			if (data->u.gpa == KVM_XEN_INVALID_GPA) {
+				kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+				r = 0;
+				break;
+			}
+
+			r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
+					     data->u.gpa, sizeof(struct vcpu_info));
+		} else {
+			if (data->u.hva == 0) {
+				kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+				r = 0;
+				break;
+			}
+
+			r = kvm_gpc_activate_hva(&vcpu->arch.xen.vcpu_info_cache,
+						 data->u.hva, sizeof(struct vcpu_info));
 		}
 
-		r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
-				     data->u.gpa, sizeof(struct vcpu_info));
 		if (!r)
 			kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
@@ -1021,6 +1034,15 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
 		r = 0;
 		break;
 
+	case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
+		if (vcpu->arch.xen.vcpu_info_cache.active &&
+		    vcpu->arch.xen.vcpu_info_cache.gpa == KVM_XEN_INVALID_GPA)
+			data->u.hva = vcpu->arch.xen.vcpu_info_cache.uhva;
+		else
+			data->u.hva = 0;
+		r = 0;
+		break;
+
 	case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO:
 		if (vcpu->arch.xen.vcpu_time_info_cache.active)
 			data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ac5caba313d1..d2665319db6e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1809,6 +1809,7 @@ struct kvm_xen_vcpu_attr {
 	union {
 		__u64 gpa;
 #define KVM_XEN_INVALID_GPA ((__u64)-1)
+		__u64 hva;
 		__u64 pad[8];
 		struct {
 			__u64 state;
@@ -1839,6 +1840,8 @@ struct kvm_xen_vcpu_attr {
 #define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID		0x6
 #define KVM_XEN_VCPU_ATTR_TYPE_TIMER		0x7
 #define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR	0x8
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA	0x9
 
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (11 preceding siblings ...)
  2024-01-15 12:56 ` [PATCH v12 12/20] KVM: xen: allow vcpu_info " Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-02-07  4:14   ` Sean Christopherson
  2024-01-15 12:57 ` [PATCH v12 14/20] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA Paul Durrant
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Using the HVA of the shared_info page is more efficient, so if the
capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method
to do the mapping.

NOTE: Have the juggle_shinfo_state() thread map and unmap using both
      GFN and HVA, to make sure the older mechanism is not broken.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v3:
 - Re-work the juggle_shinfo_state() thread.

v2:
 - New in this version.
---
 .../selftests/kvm/x86_64/xen_shinfo_test.c    | 44 +++++++++++++++----
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 9ec9ab60b63e..a61500ff0822 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -389,6 +389,7 @@ static int cmp_timespec(struct timespec *a, struct timespec *b)
 		return 0;
 }
 
+static struct shared_info *shinfo;
 static struct vcpu_info *vinfo;
 static struct kvm_vcpu *vcpu;
 
@@ -404,20 +405,38 @@ static void *juggle_shinfo_state(void *arg)
 {
 	struct kvm_vm *vm = (struct kvm_vm *)arg;
 
-	struct kvm_xen_hvm_attr cache_activate = {
+	struct kvm_xen_hvm_attr cache_activate_gfn = {
 		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
 		.u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE
 	};
 
-	struct kvm_xen_hvm_attr cache_deactivate = {
+	struct kvm_xen_hvm_attr cache_deactivate_gfn = {
 		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
 		.u.shared_info.gfn = KVM_XEN_INVALID_GFN
 	};
 
+	struct kvm_xen_hvm_attr cache_activate_hva = {
+		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA,
+		.u.shared_info.hva = (unsigned long)shinfo
+	};
+
+	struct kvm_xen_hvm_attr cache_deactivate_hva = {
+		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+		.u.shared_info.hva = 0
+	};
+
+	int xen_caps = kvm_check_cap(KVM_CAP_XEN_HVM);
+
 	for (;;) {
-		__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate);
-		__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate);
+		__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_gfn);
 		pthread_testcancel();
+		__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_gfn);
+
+		if (xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) {
+			__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_hva);
+			pthread_testcancel();
+			__vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_hva);
+		}
 	}
 
 	return NULL;
@@ -442,6 +461,7 @@ int main(int argc, char *argv[])
 	bool do_runstate_flag = !!(xen_caps & KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG);
 	bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
 	bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);
+	bool has_shinfo_hva = !!(xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA);
 
 	clock_gettime(CLOCK_REALTIME, &min_ts);
 
@@ -452,7 +472,7 @@ int main(int argc, char *argv[])
 				    SHINFO_REGION_GPA, SHINFO_REGION_SLOT, 3, 0);
 	virt_map(vm, SHINFO_REGION_GVA, SHINFO_REGION_GPA, 3);
 
-	struct shared_info *shinfo = addr_gpa2hva(vm, SHINFO_VADDR);
+	shinfo = addr_gpa2hva(vm, SHINFO_VADDR);
 
 	int zero_fd = open("/dev/zero", O_RDONLY);
 	TEST_ASSERT(zero_fd != -1, "Failed to open /dev/zero");
@@ -488,10 +508,16 @@ int main(int argc, char *argv[])
 			    "Failed to read back RUNSTATE_UPDATE_FLAG attr");
 	}
 
-	struct kvm_xen_hvm_attr ha = {
-		.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
-		.u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE,
-	};
+	struct kvm_xen_hvm_attr ha = {};
+
+	if (has_shinfo_hva) {
+		ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA;
+		ha.u.shared_info.hva = (unsigned long)shinfo;
+	} else {
+		ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO;
+		ha.u.shared_info.gfn = SHINFO_ADDR / PAGE_SIZE;
+	}
+
 	vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha);
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 14/20] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (12 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-01-15 12:57 ` [PATCH v12 15/20] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability Paul Durrant
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present
then re-map vcpu_info using the HVA part way through the tests to make sure
then there is no functional change.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v5:
 - New in this version.
---
 .../selftests/kvm/x86_64/xen_shinfo_test.c        | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index a61500ff0822..d2ea0435f4f7 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -62,6 +62,7 @@ enum {
 	TEST_POLL_TIMEOUT,
 	TEST_POLL_MASKED,
 	TEST_POLL_WAKE,
+	SET_VCPU_INFO,
 	TEST_TIMER_PAST,
 	TEST_LOCKING_SEND_RACE,
 	TEST_LOCKING_POLL_RACE,
@@ -321,6 +322,10 @@ static void guest_code(void)
 
 	GUEST_SYNC(TEST_POLL_WAKE);
 
+	/* Set the vcpu_info to point at exactly the place it already is to
+	 * make sure the attribute is functional. */
+	GUEST_SYNC(SET_VCPU_INFO);
+
 	/* A timer wake an *unmasked* port which should wake us with an
 	 * actual interrupt, while we're polling on a different port. */
 	ports[0]++;
@@ -888,6 +893,16 @@ int main(int argc, char *argv[])
 				alarm(1);
 				break;
 
+			case SET_VCPU_INFO:
+				if (has_shinfo_hva) {
+					struct kvm_xen_vcpu_attr vih = {
+						.type = KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA,
+						.u.hva = (unsigned long)vinfo
+					};
+					vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &vih);
+				}
+				break;
+
 			case TEST_TIMER_PAST:
 				TEST_ASSERT(!evtchn_irq_expected,
 					    "Expected event channel IRQ but it didn't happen");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 15/20] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (13 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 14/20] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-01-15 12:57 ` [PATCH v12 16/20] KVM: xen: split up kvm_xen_set_evtchn_fast() Paul Durrant
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Now that all relevant kernel changes and selftests are in place, enable the
new capability.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v2:
 - New in this version.
---
 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19e6cc1dadfe..4864251dca41 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4678,7 +4678,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		    KVM_XEN_HVM_CONFIG_SHARED_INFO |
 		    KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL |
 		    KVM_XEN_HVM_CONFIG_EVTCHN_SEND |
-		    KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
+		    KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE |
+		    KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA;
 		if (sched_info_on())
 			r |= KVM_XEN_HVM_CONFIG_RUNSTATE |
 			     KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 16/20] KVM: xen: split up kvm_xen_set_evtchn_fast()
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (14 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 15/20] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-01-15 12:57 ` [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() Paul Durrant
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

The implementation of kvm_xen_set_evtchn_fast() is a rather lengthy piece
of code that performs two operations: updating of the shared_info
evtchn_pending mask, and updating of the vcpu_info evtchn_pending_sel
mask. Introduce a separate function to perform each of those operations and
re-work kvm_xen_set_evtchn_fast() to use them.

No functional change intended.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v11:
 - Fixed /64 vs /32 switcheroo and changed type of port_word_bit back to
   int.

v10:
 - Updated in this version. Dropped David'd R-b since the updates are
   non-trivial.

v8:
 - New in this version.
---
 arch/x86/kvm/xen.c | 175 ++++++++++++++++++++++++++-------------------
 1 file changed, 100 insertions(+), 75 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index c99008d217f4..5ce02699f44c 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1665,6 +1665,101 @@ static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port)
 	}
 }
 
+static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
+	unsigned long *pending_bits, *mask_bits;
+	unsigned long flags;
+	int rc = -EWOULDBLOCK;
+
+	read_lock_irqsave(&gpc->lock, flags);
+	if (!kvm_gpc_check(gpc, PAGE_SIZE))
+		goto out;
+
+	if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
+		struct shared_info *shinfo = gpc->khva;
+
+		pending_bits = (unsigned long *)&shinfo->evtchn_pending;
+		mask_bits = (unsigned long *)&shinfo->evtchn_mask;
+	} else {
+		struct compat_shared_info *shinfo = gpc->khva;
+
+		pending_bits = (unsigned long *)&shinfo->evtchn_pending;
+		mask_bits = (unsigned long *)&shinfo->evtchn_mask;
+	}
+
+	if (test_and_set_bit(port, pending_bits)) {
+		rc = 0; /* It was already raised */
+	} else if (test_bit(port, mask_bits)) {
+		rc = -ENOTCONN; /* It is masked */
+		kvm_xen_check_poller(vcpu, port);
+	} else {
+		rc = 1; /* It is newly raised */
+	}
+
+ out:
+	read_unlock_irqrestore(&gpc->lock, flags);
+	return rc;
+}
+
+static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct gfn_to_pfn_cache *gpc = &vcpu->arch.xen.vcpu_info_cache;
+	unsigned long flags;
+	bool kick_vcpu = false;
+
+	read_lock_irqsave(&gpc->lock, flags);
+
+	/*
+	 * Try to deliver the event directly to the vcpu_info. If successful and
+	 * the guest is using upcall_vector delivery, send the MSI.
+	 * If the pfncache is invalid, set the shadow. In this case, or if the
+	 * guest is using another form of event delivery, the vCPU must be
+	 * kicked to complete the delivery.
+	 */
+	if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
+		struct vcpu_info *vcpu_info = gpc->khva;
+		int port_word_bit = port / 64;
+
+		if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+			if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
+				kick_vcpu = true;
+			goto out;
+		}
+
+		if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
+			WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+			kick_vcpu = true;
+		}
+	} else {
+		struct compat_vcpu_info *vcpu_info = gpc->khva;
+		int port_word_bit = port / 32;
+
+		if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+			if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
+				kick_vcpu = true;
+			goto out;
+		}
+
+		if (!test_and_set_bit(port_word_bit,
+				      (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
+			WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+			kick_vcpu = true;
+		}
+	}
+
+	if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
+		kvm_xen_inject_vcpu_vector(vcpu);
+		kick_vcpu = false;
+	}
+
+ out:
+	read_unlock_irqrestore(&gpc->lock, flags);
+	return kick_vcpu;
+}
+
 /*
  * The return value from this function is propagated to kvm_set_irq() API,
  * so it returns:
@@ -1673,15 +1768,12 @@ static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port)
  *  > 0   Number of CPUs interrupt was delivered to
  *
  * It is also called directly from kvm_arch_set_irq_inatomic(), where the
- * only check on its return value is a comparison with -EWOULDBLOCK'.
+ * only check on its return value is a comparison with -EWOULDBLOCK
+ * (which may be returned by set_shinfo_evtchn_pending()).
  */
 int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
 {
-	struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
 	struct kvm_vcpu *vcpu;
-	unsigned long *pending_bits, *mask_bits;
-	unsigned long flags;
-	int port_word_bit;
 	bool kick_vcpu = false;
 	int vcpu_idx, idx, rc;
 
@@ -1701,79 +1793,12 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
 	if (xe->port >= max_evtchn_port(kvm))
 		return -EINVAL;
 
-	rc = -EWOULDBLOCK;
-
 	idx = srcu_read_lock(&kvm->srcu);
 
-	read_lock_irqsave(&gpc->lock, flags);
-	if (!kvm_gpc_check(gpc, PAGE_SIZE))
-		goto out_rcu;
-
-	if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
-		struct shared_info *shinfo = gpc->khva;
-		pending_bits = (unsigned long *)&shinfo->evtchn_pending;
-		mask_bits = (unsigned long *)&shinfo->evtchn_mask;
-		port_word_bit = xe->port / 64;
-	} else {
-		struct compat_shared_info *shinfo = gpc->khva;
-		pending_bits = (unsigned long *)&shinfo->evtchn_pending;
-		mask_bits = (unsigned long *)&shinfo->evtchn_mask;
-		port_word_bit = xe->port / 32;
-	}
+	rc = set_shinfo_evtchn_pending(vcpu, xe->port);
+	if (rc == 1) /* Delivered to the bitmap in shared_info */
+		kick_vcpu = set_vcpu_info_evtchn_pending(vcpu, xe->port);
 
-	/*
-	 * If this port wasn't already set, and if it isn't masked, then
-	 * we try to set the corresponding bit in the in-kernel shadow of
-	 * evtchn_pending_sel for the target vCPU. And if *that* wasn't
-	 * already set, then we kick the vCPU in question to write to the
-	 * *real* evtchn_pending_sel in its own guest vcpu_info struct.
-	 */
-	if (test_and_set_bit(xe->port, pending_bits)) {
-		rc = 0; /* It was already raised */
-	} else if (test_bit(xe->port, mask_bits)) {
-		rc = -ENOTCONN; /* Masked */
-		kvm_xen_check_poller(vcpu, xe->port);
-	} else {
-		rc = 1; /* Delivered to the bitmap in shared_info. */
-		/* Now switch to the vCPU's vcpu_info to set the index and pending_sel */
-		read_unlock_irqrestore(&gpc->lock, flags);
-		gpc = &vcpu->arch.xen.vcpu_info_cache;
-
-		read_lock_irqsave(&gpc->lock, flags);
-		if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) {
-			/*
-			 * Could not access the vcpu_info. Set the bit in-kernel
-			 * and prod the vCPU to deliver it for itself.
-			 */
-			if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
-				kick_vcpu = true;
-			goto out_rcu;
-		}
-
-		if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
-			struct vcpu_info *vcpu_info = gpc->khva;
-			if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
-				WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
-				kick_vcpu = true;
-			}
-		} else {
-			struct compat_vcpu_info *vcpu_info = gpc->khva;
-			if (!test_and_set_bit(port_word_bit,
-					      (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
-				WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
-				kick_vcpu = true;
-			}
-		}
-
-		/* For the per-vCPU lapic vector, deliver it as MSI. */
-		if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
-			kvm_xen_inject_vcpu_vector(vcpu);
-			kick_vcpu = false;
-		}
-	}
-
- out_rcu:
-	read_unlock_irqrestore(&gpc->lock, flags);
 	srcu_read_unlock(&kvm->srcu, idx);
 
 	if (kick_vcpu) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (15 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 16/20] KVM: xen: split up kvm_xen_set_evtchn_fast() Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-02-07  4:17   ` Sean Christopherson
  2024-01-15 12:57 ` [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first Paul Durrant
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
There is only actually blocking with PREEMPT_RT because the locks will
turned into mutexes. There is no 'raw' version of rwlock_t that can be used
to avoid that, so use read_trylock() and treat failure to lock the same as
an invalid cache.

[1] https://lore.kernel.org/lkml/99771ef3a4966a01fefd3adbb2ba9c3a75f97cf2.camel@infradead.org/T/#mbd06e5a04534ce9c0ee94bd8f1e8d942b2d45bd6

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org

v11:
 - Amended the commit comment.

v10:
 - New in this version.
---
 arch/x86/kvm/xen.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 5ce02699f44c..9168a6ec88fd 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1673,10 +1673,13 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
 	unsigned long flags;
 	int rc = -EWOULDBLOCK;
 
-	read_lock_irqsave(&gpc->lock, flags);
-	if (!kvm_gpc_check(gpc, PAGE_SIZE))
+	local_irq_save(flags);
+	if (!read_trylock(&gpc->lock))
 		goto out;
 
+	if (!kvm_gpc_check(gpc, PAGE_SIZE))
+		goto out_unlock;
+
 	if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
 		struct shared_info *shinfo = gpc->khva;
 
@@ -1698,8 +1701,10 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
 		rc = 1; /* It is newly raised */
 	}
 
+ out_unlock:
+	read_unlock(&gpc->lock);
  out:
-	read_unlock_irqrestore(&gpc->lock, flags);
+	local_irq_restore(flags);
 	return rc;
 }
 
@@ -1709,21 +1714,23 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
 	struct gfn_to_pfn_cache *gpc = &vcpu->arch.xen.vcpu_info_cache;
 	unsigned long flags;
 	bool kick_vcpu = false;
+	bool locked;
 
-	read_lock_irqsave(&gpc->lock, flags);
+	local_irq_save(flags);
+	locked = read_trylock(&gpc->lock);
 
 	/*
 	 * Try to deliver the event directly to the vcpu_info. If successful and
 	 * the guest is using upcall_vector delivery, send the MSI.
-	 * If the pfncache is invalid, set the shadow. In this case, or if the
-	 * guest is using another form of event delivery, the vCPU must be
-	 * kicked to complete the delivery.
+	 * If the pfncache lock is contended or the cache is invalid, set the
+	 * shadow. In this case, or if the guest is using another form of event
+	 * delivery, the vCPU must be kicked to complete the delivery.
 	 */
 	if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
 		struct vcpu_info *vcpu_info = gpc->khva;
 		int port_word_bit = port / 64;
 
-		if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+		if ((!locked || !kvm_gpc_check(gpc, sizeof(*vcpu_info)))) {
 			if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
 				kick_vcpu = true;
 			goto out;
@@ -1737,7 +1744,7 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
 		struct compat_vcpu_info *vcpu_info = gpc->khva;
 		int port_word_bit = port / 32;
 
-		if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+		if ((!locked || !kvm_gpc_check(gpc, sizeof(*vcpu_info)))) {
 			if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
 				kick_vcpu = true;
 			goto out;
@@ -1756,7 +1763,10 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
 	}
 
  out:
-	read_unlock_irqrestore(&gpc->lock, flags);
+	if (locked)
+		read_unlock(&gpc->lock);
+
+	local_irq_restore(flags);
 	return kick_vcpu;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (16 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-02-07  4:22   ` Sean Christopherson
  2024-01-15 12:57 ` [PATCH v12 19/20] KVM: xen: allow vcpu_info content to be 'safely' copied Paul Durrant
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

Taking a write lock on a pfncache will be disruptive if the cache is
heavily used (which only requires a read lock). Hence, in the MMU notifier
callback, take read locks on caches to check for a match; only taking a
write lock to actually perform an invalidation (after a another check).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>

v10:
 - New in this version.
---
 virt/kvm/pfncache.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index ae822bff812f..70394d7c9a38 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -29,14 +29,30 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
 
 	spin_lock(&kvm->gpc_lock);
 	list_for_each_entry(gpc, &kvm->gpc_list, list) {
-		write_lock_irq(&gpc->lock);
+		read_lock_irq(&gpc->lock);
 
 		/* Only a single page so no need to care about length */
 		if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
 		    gpc->uhva >= start && gpc->uhva < end) {
-			gpc->valid = false;
+			read_unlock_irq(&gpc->lock);
+
+			/*
+			 * There is a small window here where the cache could
+			 * be modified, and invalidation would no longer be
+			 * necessary. Hence check again whether invalidation
+			 * is still necessary once the write lock has been
+			 * acquired.
+			 */
+
+			write_lock_irq(&gpc->lock);
+			if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
+			    gpc->uhva >= start && gpc->uhva < end)
+				gpc->valid = false;
+			write_unlock_irq(&gpc->lock);
+			continue;
 		}
-		write_unlock_irq(&gpc->lock);
+
+		read_unlock_irq(&gpc->lock);
 	}
 	spin_unlock(&kvm->gpc_lock);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 19/20] KVM: xen: allow vcpu_info content to be 'safely' copied
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (17 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-01-15 12:57 ` [PATCH v12 20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Paul Durrant
  2024-01-25 15:03 ` [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: Paul Durrant <pdurrant@amazon.com>

If the guest sets an explicit vcpu_info GPA then, for any of the first 32
vCPUs, the content of the default vcpu_info in the shared_info page must be
copied into the new location. Because this copy may race with event
delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
needs to be a way to defer that until the copy is complete.
Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
that is used in atomic context if the vcpu_info PFN cache has been
invalidated so that the update of vcpu_info can be deferred until the
cache can be refreshed (on vCPU thread's the way back into guest context).

Also use this shadow if the vcpu_info cache has been *deactivated*, so that
the VMM can safely copy the vcpu_info content and then re-activate the
cache with the new GPA. To do this, stop considering an inactive vcpu_info
cache as a hard error in kvm_xen_set_evtchn_fast().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
---
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org

v8:
 - Update commit comment.

v6:
 - New in this version.
---
 arch/x86/kvm/xen.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 9168a6ec88fd..972098ec4f6a 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1797,9 +1797,6 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
 		WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx);
 	}
 
-	if (!vcpu->arch.xen.vcpu_info_cache.active)
-		return -EINVAL;
-
 	if (xe->port >= max_evtchn_port(kvm))
 		return -EINVAL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v12 20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (18 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 19/20] KVM: xen: allow vcpu_info content to be 'safely' copied Paul Durrant
@ 2024-01-15 12:57 ` Paul Durrant
  2024-01-25 15:03 ` [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
  20 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-15 12:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Paul Durrant, Shuah Khan, kvm,
	linux-doc, linux-kernel, linux-kselftest

From: David Woodhouse <dwmw@amazon.co.uk>

This function can race with kvm_gpc_deactivate(), which does not take
the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
temporarily dropped its write lock on gpc->lock.

Then if hva_to_pfn_retry() determines that the PFN hasn't changed and
that the original pfn and khva can be reused, they get assigned back to
gpc->pfn and gpc->khva even though the khva was already unmapped by
kvm_gpc_deactivate(). This leaves the cache in an apparently valid state
but with ->khva pointing to an address which has been unmapped. Which in
turn leads to oopses in e.g. __kvm_xen_has_interrupt() and
set_shinfo_evtchn_pending() when they dereference said khva.

It may be possible to fix this just by making kvm_gpc_deactivate() take
the ->refresh_lock, but that still leaves ->refresh_lock being basically
redundant with the write lock on ->lock, which frankly makes my skin
itch, with the way that pfn_to_hva_retry() operates on fields in the gpc
without holding a write lock on ->lock.

Instead, fix it by cleaning up the semantics of hva_to_pfn_retry(). It
now no longer does locking gymnastics because it no longer operates on
the gpc object at all. I's called with a uhva and simply returns the
corresponding pfn (pinned), and a mapped khva for it.

Its caller __kvm_gpc_refresh() now sets gpc->uhva and clears gpc->valid
before dropping ->lock, calling hva_to_pfn_retry() and retaking ->lock
for write.

If hva_to_pfn_retry() fails, *or* if the ->uhva or ->active fields in
the gpc changed while the lock was dropped, the new mapping is discarded
and the gpc is not modified. On success with an unchanged gpc, the new
mapping is installed and the current ->pfn and ->uhva are taken into the
local old_pfn and old_khva variables to be unmapped once the locks are
all released.

This simplification means that ->refresh_lock is no longer needed for
correctness, but it does still provide a minor optimisation because it
will prevent two concurrent __kvm_gpc_refresh() calls from mapping a
given PFN, only for one of them to lose the race and discard its
mapping.

The optimisation in hva_to_pfn_retry() where it attempts to use the old
mapping if the pfn doesn't change is dropped, since it makes the pinning
more complex. It's a pointless optimisation anyway, since the odds of
the pfn ending up the same when the uhva has changed (i.e. the odds of
the two userspace addresses both pointing to the same underlying
physical page) are negligible,

The 'hva_changed' local variable in __kvm_gpc_refresh() is also removed,
since it's simpler just to clear gpc->valid if the uhva changed.
Likewise the unmap_old variable is dropped because it's just as easy to
check the old_pfn variable for KVM_PFN_ERR_FAULT.

I remain slightly confused because although this is clearly a race in
the gfn_to_pfn_cache code, I don't quite know how the Xen support code
actually managed to trigger it. We've seen oopses from dereferencing a
valid-looking ->khva in both __kvm_xen_has_interrupt() (the vcpu_info)
and in set_shinfo_evtchn_pending() (the shared_info). But surely the
race shouldn't happen for the vcpu_info gpc because all calls to both
refresh and deactivate hold the vcpu mutex, and it shouldn't happen
for the shared_info gpc because all calls to both will hold the
kvm->arch.xen.xen_lock mutex.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>

v12:
 - New in this version.
---
 virt/kvm/pfncache.c | 184 +++++++++++++++++++++-----------------------
 1 file changed, 88 insertions(+), 96 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 70394d7c9a38..4863f9f3d369 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -135,110 +135,67 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
 	return kvm->mmu_invalidate_seq != mmu_seq;
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+/*
+ * Given a user virtual address, obtain a pinned host PFN and kernel mapping
+ * for it. The caller will release the PFN after installing it into the GPC
+ * so that the MMU notifier invalidation mechanism is active.
+ */
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva,
+				  kvm_pfn_t *pfn, void **khva)
 {
 	/* Note, the new page offset may be different than the old! */
-	void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
 	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
 	void *new_khva = NULL;
 	unsigned long mmu_seq;
 
-	lockdep_assert_held(&gpc->refresh_lock);
-
-	lockdep_assert_held_write(&gpc->lock);
-
-	/*
-	 * Invalidate the cache prior to dropping gpc->lock, the gpa=>uhva
-	 * assets have already been updated and so a concurrent check() from a
-	 * different task may not fail the gpa/uhva/generation checks.
-	 */
-	gpc->valid = false;
-
-	do {
-		mmu_seq = gpc->kvm->mmu_invalidate_seq;
+	for (;;) {
+		mmu_seq = kvm->mmu_invalidate_seq;
 		smp_rmb();
 
-		write_unlock_irq(&gpc->lock);
-
-		/*
-		 * If the previous iteration "failed" due to an mmu_notifier
-		 * event, release the pfn and unmap the kernel virtual address
-		 * from the previous attempt.  Unmapping might sleep, so this
-		 * needs to be done after dropping the lock.  Opportunistically
-		 * check for resched while the lock isn't held.
-		 */
-		if (new_pfn != KVM_PFN_ERR_FAULT) {
-			/*
-			 * Keep the mapping if the previous iteration reused
-			 * the existing mapping and didn't create a new one.
-			 */
-			if (new_khva != old_khva)
-				gpc_unmap(new_pfn, new_khva);
-
-			kvm_release_pfn_clean(new_pfn);
-
-			cond_resched();
-		}
-
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(uhva, false, false, NULL, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
-			goto out_error;
+			return -EFAULT;
 
 		/*
-		 * Obtain a new kernel mapping if KVM itself will access the
-		 * pfn.  Note, kmap() and memremap() can both sleep, so this
-		 * too must be done outside of gpc->lock!
+		 * Always obtain a new kernel mapping. Trying to reuse an
+		 * existing one is more complex than it's worth.
 		 */
-		if (new_pfn == gpc->pfn)
-			new_khva = old_khva;
-		else
-			new_khva = gpc_map(new_pfn);
-
+		new_khva = gpc_map(new_pfn);
 		if (!new_khva) {
 			kvm_release_pfn_clean(new_pfn);
-			goto out_error;
+			return -EFAULT;
 		}
 
-		write_lock_irq(&gpc->lock);
+		if (!mmu_notifier_retry_cache(kvm, mmu_seq))
+			break;
 
 		/*
-		 * Other tasks must wait for _this_ refresh to complete before
-		 * attempting to refresh.
+		 * If this iteration "failed" due to an mmu_notifier event,
+		 * release the pfn and unmap the kernel virtual address, and
+		 * loop around again.
 		 */
-		WARN_ON_ONCE(gpc->valid);
-	} while (mmu_notifier_retry_cache(gpc->kvm, mmu_seq));
-
-	gpc->valid = true;
-	gpc->pfn = new_pfn;
-	gpc->khva = new_khva + offset_in_page(gpc->uhva);
+		if (new_pfn != KVM_PFN_ERR_FAULT) {
+			gpc_unmap(new_pfn, new_khva);
+			kvm_release_pfn_clean(new_pfn);
+		}
+	}
 
-	/*
-	 * Put the reference to the _new_ pfn.  The pfn is now tracked by the
-	 * cache and can be safely migrated, swapped, etc... as the cache will
-	 * invalidate any mappings in response to relevant mmu_notifier events.
-	 */
-	kvm_release_pfn_clean(new_pfn);
+	*pfn = new_pfn;
+	*khva = new_khva;
 
 	return 0;
-
-out_error:
-	write_lock_irq(&gpc->lock);
-
-	return -EFAULT;
 }
 
-static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
-			     unsigned long len)
+static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
+			     unsigned long uhva, unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
 	unsigned long page_offset = (gpa != KVM_XEN_INVALID_GPA) ?
 		offset_in_page(gpa) :
 		offset_in_page(uhva);
-	bool unmap_old = false;
 	unsigned long old_uhva;
-	kvm_pfn_t old_pfn;
-	bool hva_change = false;
+	kvm_pfn_t old_pfn = KVM_PFN_ERR_FAULT;
 	void *old_khva;
 	int ret;
 
@@ -251,8 +208,9 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 
 	/*
 	 * If another task is refreshing the cache, wait for it to complete.
-	 * There is no guarantee that concurrent refreshes will see the same
-	 * gpa, memslots generation, etc..., so they must be fully serialized.
+	 * This is purely an optimisation, to avoid concurrent mappings from
+	 * hva_to_pfn_retry(), all but one of which will be discarded after
+	 * losing a race to install them in the GPC.
 	 */
 	mutex_lock(&gpc->refresh_lock);
 
@@ -272,7 +230,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 		gpc->uhva = PAGE_ALIGN_DOWN(uhva);
 
 		if (gpc->uhva != old_uhva)
-			hva_change = true;
+			gpc->valid = false;
 	} else if (gpc->gpa != gpa ||
 		   gpc->generation != slots->generation ||
 		   kvm_is_error_hva(gpc->uhva)) {
@@ -285,7 +243,11 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 
 		if (kvm_is_error_hva(gpc->uhva)) {
 			ret = -EFAULT;
-			goto out;
+
+			gpc->valid = false;
+			gpc->pfn = KVM_PFN_ERR_FAULT;
+			gpc->khva = NULL;
+			goto out_unlock;
 		}
 
 		/*
@@ -293,7 +255,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 		 * HVA may still be the same.
 		 */
 		if (gpc->uhva != old_uhva)
-			hva_change = true;
+			gpc->valid = false;
 	} else {
 		gpc->uhva = old_uhva;
 	}
@@ -305,9 +267,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 	 * If the userspace HVA changed or the PFN was already invalid,
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
-	if (!gpc->valid || hva_change) {
-		ret = hva_to_pfn_retry(gpc);
-	} else {
+	if (gpc->valid) {
 		/*
 		 * If the HVA→PFN mapping was already valid, don't unmap it.
 		 * But do update gpc->khva because the offset within the page
@@ -315,30 +275,62 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
 		 */
 		gpc->khva = old_khva + page_offset;
 		ret = 0;
-		goto out_unlock;
-	}
 
- out:
-	/*
-	 * Invalidate the cache and purge the pfn/khva if the refresh failed.
-	 * Some/all of the uhva, gpa, and memslot generation info may still be
-	 * valid, leave it as is.
-	 */
-	if (ret) {
+		/* old_pfn must not be unmapped because it was reused. */
+		old_pfn = KVM_PFN_ERR_FAULT;
+	} else {
+		kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
+		unsigned long new_uhva = gpc->uhva;
+		void *new_khva = NULL;
+
+		/*
+		 * Invalidate the cache prior to dropping gpc->lock; the
+		 * gpa=>uhva assets have already been updated and so a
+		 * concurrent check() from a different task may not fail
+		 * the gpa/uhva/generation checks as it should.
+		 */
 		gpc->valid = false;
-		gpc->pfn = KVM_PFN_ERR_FAULT;
-		gpc->khva = NULL;
-	}
 
-	/* Detect a pfn change before dropping the lock! */
-	unmap_old = (old_pfn != gpc->pfn);
+		write_unlock_irq(&gpc->lock);
+
+		ret = hva_to_pfn_retry(gpc->kvm, new_uhva, &new_pfn, &new_khva);
+
+		write_lock_irq(&gpc->lock);
+
+		WARN_ON_ONCE(gpc->valid);
+
+		if (ret || !gpc->active || gpc->uhva != new_uhva) {
+			/*
+			 * On failure or if another change occurred while the
+			 * lock was dropped, just purge the new mapping.
+			 */
+			old_pfn = new_pfn;
+			old_khva = new_khva;
+		} else {
+			old_pfn = gpc->pfn;
+			old_khva = gpc->khva;
+
+			gpc->pfn = new_pfn;
+			gpc->khva = new_khva + offset_in_page(gpc->uhva);
+			gpc->valid = true;
+		}
+
+		/*
+		 * Put the reference to the _new_ pfn. On success, the
+		 * pfn is now tracked by the cache and can safely be
+		 * migrated, swapped, etc. as the cache will invalidate
+		 * any mappings in response to relevant mmu_notifier
+		 * events.
+		 */
+		kvm_release_pfn_clean(new_pfn);
+	}
 
 out_unlock:
 	write_unlock_irq(&gpc->lock);
 
 	mutex_unlock(&gpc->refresh_lock);
 
-	if (unmap_old)
+	if (old_pfn != KVM_PFN_ERR_FAULT)
 		gpc_unmap(old_pfn, old_khva);
 
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
  2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
                   ` (19 preceding siblings ...)
  2024-01-15 12:57 ` [PATCH v12 20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Paul Durrant
@ 2024-01-25 15:03 ` Paul Durrant
  2024-01-25 20:07   ` David Woodhouse
  2024-01-26  1:19   ` Sean Christopherson
  20 siblings, 2 replies; 58+ messages in thread
From: Paul Durrant @ 2024-01-25 15:03 UTC (permalink / raw)
  To: Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, David Woodhouse, Shuah Khan, kvm, linux-doc,
	linux-kernel, linux-kselftest

On 15/01/2024 12:56, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This series has one small fix to what was in v11 [1]:
> 
> * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
> 
> The v11 patch failed to set the return code of the ioctl if the mode
> was not actually changed, leading to a spurious failure.
> 
> This version of the series also contains a new bug-fix to the pfncache
> code from David Woodhouse.
> 
> [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/
> 

Ping?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
  2024-01-25 15:03 ` [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
@ 2024-01-25 20:07   ` David Woodhouse
  2024-01-26  1:19   ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2024-01-25 20:07 UTC (permalink / raw)
  To: paul, Paolo Bonzini, Jonathan Corbet, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1081 bytes --]

On Thu, 2024-01-25 at 15:03 +0000, Paul Durrant wrote:
> On 15/01/2024 12:56, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> > 
> > This series has one small fix to what was in v11 [1]:
> > 
> > * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
> > 
> > The v11 patch failed to set the return code of the ioctl if the mode
> > was not actually changed, leading to a spurious failure.
> > 
> > This version of the series also contains a new bug-fix to the pfncache
> > code from David Woodhouse.
> > 
> > [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/
> > 
> 
> Ping?
> 

I think it's only the final patch which is controversial, isn't it? We
can drop that for now and I can submit it under separate cover. 

It'll be basically the same patch, just won't claim to be a bug fix:
https://lore.kernel.org/kvm/6dc0e9d1f5db41a053b734b29403ad48c288dea3.camel@infradead.org/

Although I'll eat my hat if that "should never happen" bug actually
does keep happening after the locking is less baroque.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
  2024-01-25 15:03 ` [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
  2024-01-25 20:07   ` David Woodhouse
@ 2024-01-26  1:19   ` Sean Christopherson
  2024-02-02 17:37     ` Paul Durrant
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-01-26  1:19 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Thu, Jan 25, 2024, Paul Durrant wrote:
> On 15/01/2024 12:56, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> > 
> > This series has one small fix to what was in v11 [1]:
> > 
> > * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
> > 
> > The v11 patch failed to set the return code of the ioctl if the mode
> > was not actually changed, leading to a spurious failure.
> > 
> > This version of the series also contains a new bug-fix to the pfncache
> > code from David Woodhouse.
> > 
> > [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/
> > 
> 
> Ping?

Sorry, I have done basically zero upstream reviews over the last few weeks, for
a variety of reasons.  Unless yet another thing pops up, I expect to dive into
upstream reviews tomorrow and spend a good long while there.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
  2024-01-26  1:19   ` Sean Christopherson
@ 2024-02-02 17:37     ` Paul Durrant
  2024-02-02 22:03       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-02-02 17:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 26/01/2024 01:19, Sean Christopherson wrote:
> On Thu, Jan 25, 2024, Paul Durrant wrote:
>> On 15/01/2024 12:56, Paul Durrant wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> This series has one small fix to what was in v11 [1]:
>>>
>>> * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
>>>
>>> The v11 patch failed to set the return code of the ioctl if the mode
>>> was not actually changed, leading to a spurious failure.
>>>
>>> This version of the series also contains a new bug-fix to the pfncache
>>> code from David Woodhouse.
>>>
>>> [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/
>>>
>>
>> Ping?
> 
> Sorry, I have done basically zero upstream reviews over the last few weeks, for
> a variety of reasons.  Unless yet another thing pops up, I expect to dive into
> upstream reviews tomorrow and spend a good long while there.

Hi Sean,

   Have you had any time to take a look at this?

   Paul

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling
  2024-02-02 17:37     ` Paul Durrant
@ 2024-02-02 22:03       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-02 22:03 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Fri, Feb 02, 2024, Paul Durrant wrote:
> On 26/01/2024 01:19, Sean Christopherson wrote:
> > On Thu, Jan 25, 2024, Paul Durrant wrote:
> > > On 15/01/2024 12:56, Paul Durrant wrote:
> > > > From: Paul Durrant <pdurrant@amazon.com>
> > > > 
> > > > This series has one small fix to what was in v11 [1]:
> > > > 
> > > > * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set
> > > > 
> > > > The v11 patch failed to set the return code of the ioctl if the mode
> > > > was not actually changed, leading to a spurious failure.
> > > > 
> > > > This version of the series also contains a new bug-fix to the pfncache
> > > > code from David Woodhouse.
> > > > 
> > > > [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/
> > > > 
> > > 
> > > Ping?
> > 
> > Sorry, I have done basically zero upstream reviews over the last few weeks, for
> > a variety of reasons.  Unless yet another thing pops up, I expect to dive into
> > upstream reviews tomorrow and spend a good long while there.
> 
> Hi Sean,
> 
>   Have you had any time to take a look at this?

No, I was hoping to get to it today, but that isn't happening.  It's next in my
queue after David Steven's "KVM: allow mapping non-refcounted pages" serie", so
hopefully Monday or Tuesday will be the day.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held
  2024-01-15 12:56 ` [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held Paul Durrant
@ 2024-02-07  3:17   ` Sean Christopherson
  2024-02-07  3:26     ` David Woodhouse
  2024-02-07  8:48     ` Paul Durrant
  0 siblings, 2 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  3:17 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

KVM: x86/xen: for the scope please.  A few commits have "KVM: xen:", but "x86/xen"
is the overwhelming favorite.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper
  2024-01-15 12:56 ` [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper Paul Durrant
@ 2024-02-07  3:20   ` Sean Christopherson
  2024-02-07  8:47     ` Paul Durrant
  2024-02-09 15:58   ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  3:20 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Jan 15, 2024, Paul Durrant wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..f3bb9e0a81fe 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1399,6 +1399,17 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
>   */
>  void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>  
> +/**
> + * kvm_gpc_mark_dirty - mark a cached page as dirty.
> + *
> + * @gpc:	   struct gfn_to_pfn_cache object.
> + */
> +static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)

Any objection to kvm_gpc_mark_dirty_in_slot()?  I want to make it clear this only
marks the gfn dirty in the memslot, i.e. that it doesn't mark the underlying page
as dirty.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held
  2024-02-07  3:17   ` Sean Christopherson
@ 2024-02-07  3:26     ` David Woodhouse
  2024-02-07 15:15       ` Sean Christopherson
  2024-02-07  8:48     ` Paul Durrant
  1 sibling, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2024-02-07  3:26 UTC (permalink / raw)
  To: Sean Christopherson, Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On Tue, 2024-02-06 at 19:17 -0800, Sean Christopherson wrote:
> KVM: x86/xen: for the scope please.  A few commits have "KVM: xen:", but "x86/xen"
> is the overwhelming favorite.

Paul's been using "KVM: xen:" in this patch series since first posting
it in September of last year. If there aren't more substantial changes
you need, would you perhaps be able to make that minor fixup as you
apply the series?

I'm not currently in the country to buy him a beer and talk him down
off the ceiling when he wakes up and reads your message.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-01-15 12:56 ` [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA Paul Durrant
@ 2024-02-07  4:03   ` Sean Christopherson
  2024-02-07  4:13     ` David Woodhouse
  2024-02-14 15:21     ` Paul Durrant
  0 siblings, 2 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:03 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand

+s390 folks (question on kvm_is_error_gpa() for ya)

On Mon, Jan 15, 2024, Paul Durrant wrote:
> @@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>  static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
>  {
>  	lockdep_assert_held(&gpc->lock);
> -	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
> +
> +	if (gpc->gpa != KVM_XEN_INVALID_GPA)

KVM_XEN_INVALID_GPA absolutely doesn't belong in common code.  Not to mention
that it will break when Paolo (rightly) moves it to an x86 header.

https://lore.kernel.org/all/20240131233056.10845-3-pbonzini@redhat.com

> +		mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
>  }
>  
>  void kvm_sigset_activate(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> index 97eec8ee3449..ae822bff812f 100644
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -48,7 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
>  	if (!gpc->active)
>  		return false;
>  
> -	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
> +	if (gpc->gpa != KVM_XEN_INVALID_GPA && gpc->generation != slots->generation)

This needs a comment.  I know what it's doing, but it wasn't obvious at first
glance, and it definitely won't be intuitive for readers that aren't intimately
familiar with memslots.

> +		return false;
> +
> +	if (kvm_is_error_hva(gpc->uhva))
>  		return false;
>  
>  	if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
> @@ -209,11 +212,13 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
>  	return -EFAULT;
>  }
>  
> -static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> +static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
>  			     unsigned long len)
>  {
>  	struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
> -	unsigned long page_offset = offset_in_page(gpa);
> +	unsigned long page_offset = (gpa != KVM_XEN_INVALID_GPA) ?
> +		offset_in_page(gpa) :
> +		offset_in_page(uhva);

This formatting is funky.  I also think it would be worth adding a helper to pair
with kvm_is_error_hva().

But!  kvm_is_error_gpa() already exists, and it very, very sneakily does a memslot
lookup and checks for a valid HVA.

s390 people, any objection to renaming kvm_is_error_gpa() to something like
kvm_gpa_has_memslot() or kvm_gpa_is_in_memslot()?  s390 is the only code that
uses the existing helper.

That would both to free up the name to pair with kvm_is_error_hva(), and would
make it obvious what the helper does; I was quite surprised that "error" means
"is covered by a valid memslot".

Back to this code, then we can have a slightly cleaner:

	unsigned long page_offset = kvm_is_error_gpa(gpa) ? offset_in_page(gpa) :
							    offset_in_page(uhva);


And I think it's worth asserting that exactly _one_ of GPA or HVA is valid, e.g.
to ensure KVM doesn't end up with botched offsets, and to make it a bit more
clear what's going on.


	if (WARN_ON_ONCE(kvm_is_error_gpa(gpa) == kvm_is_error_hva(uhva))
		return -EINVAL;

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-01-15 12:56 ` [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA Paul Durrant
@ 2024-02-07  4:10   ` Sean Christopherson
  2024-02-07  8:53     ` Paul Durrant
  2024-02-08  8:52     ` Paul Durrant
  0 siblings, 2 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:10 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Jan 15, 2024, Paul Durrant wrote:
> @@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>  		}
>  		break;
>  
> -	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
>  		int idx;
>  
>  		mutex_lock(&kvm->arch.xen.xen_lock);
>  
>  		idx = srcu_read_lock(&kvm->srcu);
>  
> -		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
> -			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
> -			r = 0;
> +		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
> +			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
> +				r = 0;
> +			} else {
> +				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
> +						     gfn_to_gpa(data->u.shared_info.gfn),
> +						     PAGE_SIZE);
> +			}
>  		} else {
> -			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
> -					     gfn_to_gpa(data->u.shared_info.gfn),
> -					     PAGE_SIZE);
> +			if (data->u.shared_info.hva == 0) {

I know I said I don't care about the KVM Xen ABI, but I still think using '0' as
"invalid" is ridiculous.

More importantly, this code needs to check that HVA is a userspace pointer.
Because __kvm_set_memory_region() performs the address checks, KVM assumes any
hva that it gets out of a memslot, i.e. from a gfn, is a safe userspace address.

kvm_is_error_hva() will catch most addresses, but I'm pretty sure there's still
a small window where userspace could use this to write kernel memory.

> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
> +				r = 0;
> +			} else {
> +				r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache,
> +							 data->u.shared_info.hva,
> +							 PAGE_SIZE);
> +			}

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-07  4:03   ` Sean Christopherson
@ 2024-02-07  4:13     ` David Woodhouse
  2024-02-14 16:01       ` Sean Christopherson
  2024-02-14 15:21     ` Paul Durrant
  1 sibling, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2024-02-07  4:13 UTC (permalink / raw)
  To: Sean Christopherson, Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

On Tue, 2024-02-06 at 20:03 -0800, Sean Christopherson wrote:
> +s390 folks (question on kvm_is_error_gpa() for ya)
> 
> On Mon, Jan 15, 2024, Paul Durrant wrote:
> > @@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
> >   static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
> >   {
> >         lockdep_assert_held(&gpc->lock);
> > -       mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
> > +
> > +       if (gpc->gpa != KVM_XEN_INVALID_GPA)
> 
> KVM_XEN_INVALID_GPA absolutely doesn't belong in common code.  Not to mention
> that it will break when Paolo (rightly) moves it to an x86 header.
> 
> https://lore.kernel.org/all/20240131233056.10845-3-pbonzini@redhat.com

We can use plain INVALID_GPA for that, I think. ISTR the reason we have
a separate KVM_XEN_INVALID_GPA is because that's a userspace API.

...

> But!  kvm_is_error_gpa() already exists, and it very, very sneakily
> does a memslot lookup and checks for a valid HVA.

Hm, that doesn't sound as fast as simple comparison. We also can't do
it from kvm_gpc_check(), can we?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN
  2024-01-15 12:57 ` [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN Paul Durrant
@ 2024-02-07  4:14   ` Sean Christopherson
  2024-02-07  8:54     ` Paul Durrant
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:14 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

Please stop making up random scopes.  Yes, I know "KVM: selftests:" is too coarse,
bt everyone doing their own thing is worse.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
  2024-01-15 12:57 ` [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() Paul Durrant
@ 2024-02-07  4:17   ` Sean Christopherson
  2024-02-07  4:21     ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:17 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Jan 15, 2024, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
> kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
> There is only actually blocking with PREEMPT_RT because the locks will
> turned into mutexes. There is no 'raw' version of rwlock_t that can be used
> to avoid that, so use read_trylock() and treat failure to lock the same as
> an invalid cache.

Are rwlocks fundamentally incapable of supporting a raw version?  Because that's
the only argument I see for adding a hack like this.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
  2024-02-07  4:17   ` Sean Christopherson
@ 2024-02-07  4:21     ` David Woodhouse
  0 siblings, 0 replies; 58+ messages in thread
From: David Woodhouse @ 2024-02-07  4:21 UTC (permalink / raw)
  To: Sean Christopherson, Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Tue, 2024-02-06 at 20:17 -0800, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> > 
> > As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
> > kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
> > There is only actually blocking with PREEMPT_RT because the locks will
> > turned into mutexes. There is no 'raw' version of rwlock_t that can be used
> > to avoid that, so use read_trylock() and treat failure to lock the same as
> > an invalid cache.
> 
> Are rwlocks fundamentally incapable of supporting a raw version?  Because that's
> the only argument I see for adding a hack like this.

I don't know about "fundamentally incapable", but there's no point in
adding them just for this, because the write lock is very rarely going
to be held, so it's not an issue to be falling back to the slow path.
And when the write lock *was* held, something often changed so we may
well be going back to the slow path anyway.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-01-15 12:57 ` [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first Paul Durrant
@ 2024-02-07  4:22   ` Sean Christopherson
  2024-02-07  4:27     ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:22 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Jan 15, 2024, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Taking a write lock on a pfncache will be disruptive if the cache is

*Unnecessarily* taking a write lock.  Please save readers a bit of brain power
and explain that this is beneificial when there are _unrelated_ invalidation.

> heavily used (which only requires a read lock). Hence, in the MMU notifier
> callback, take read locks on caches to check for a match; only taking a
> write lock to actually perform an invalidation (after a another check).

This doesn't have any dependency on this series, does it?  I.e. this should be
posted separately, and preferably with some performance data.  Not having data
isn't a sticking point, but it would be nice to verify that this isn't a
pointless optimization.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-02-07  4:22   ` Sean Christopherson
@ 2024-02-07  4:27     ` David Woodhouse
  2024-02-07  4:47       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2024-02-07  4:27 UTC (permalink / raw)
  To: Sean Christopherson, Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> > 
> > Taking a write lock on a pfncache will be disruptive if the cache is
> 
> *Unnecessarily* taking a write lock.

No. Taking a write lock will be disrupting.

Unnecessarily taking a write lock will be unnecessarily disrupting.

Taking a write lock on a Thursday will be disrupting on a Thursday.

But the key is that if the cache is heavily used, the user gets
disrupted.


>   Please save readers a bit of brain power
> and explain that this is beneificial when there are _unrelated_ invalidation.

I don't understand what you're saying there. Paul's sentence did have
an implicit "...so do that less then", but that didn't take much brain
power to infer.

> > heavily used (which only requires a read lock). Hence, in the MMU notifier
> > callback, take read locks on caches to check for a match; only taking a
> > write lock to actually perform an invalidation (after a another check).
> 
> This doesn't have any dependency on this series, does it?  I.e. this should be
> posted separately, and preferably with some performance data.  Not having data
> isn't a sticking point, but it would be nice to verify that this isn't a
> pointless optimization.

No fundamental dependency, no. But it was triggered by the previous
patch, which makes kvm_xen_set_evtchn_fast() use read_trylock() and
makes it take the slow path when there's contention. It lives here just
fine as part of the series.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-02-07  4:27     ` David Woodhouse
@ 2024-02-07  4:47       ` Sean Christopherson
  2024-02-07  4:59         ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07  4:47 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paul Durrant, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest

On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 20:22 -0800, Sean Christopherson wrote:
> > On Mon, Jan 15, 2024, Paul Durrant wrote:
> > > From: Paul Durrant <pdurrant@amazon.com>
> > > 
> > > Taking a write lock on a pfncache will be disruptive if the cache is
> > 
> > *Unnecessarily* taking a write lock.
> 
> No. Taking a write lock will be disrupting.
> 
> Unnecessarily taking a write lock will be unnecessarily disrupting.
> 
> Taking a write lock on a Thursday will be disrupting on a Thursday.
> 
> But the key is that if the cache is heavily used, the user gets
> disrupted.

If the invalidation is relevant, then this code is taking gpc->lock for write no
matter what.  The purpose of the changelog is to explain _why_ a patch adds value.

> >   Please save readers a bit of brain power
> > and explain that this is beneificial when there are _unrelated_ invalidation.
> 
> I don't understand what you're saying there. Paul's sentence did have
> an implicit "...so do that less then", but that didn't take much brain
> power to infer.

I'm saying this:

  When processing mmu_notifier invalidations for gpc caches, pre-check for
  overlap with the invalidation event while holding gpc->lock for read, and
  only take gpc->lock for write if the cache needs to be invalidated.  Doing
  a pre-check without taking gpc->lock for write avoids unnecessarily
  contending the lock for unrelated invalidations, which is very beneficial
  for caches that are heavily used (but rarely subjected to mmu_notifier
  invalidations).

is much friendlier to readers than this:

  Taking a write lock on a pfncache will be disruptive if the cache is
  heavily used (which only requires a read lock). Hence, in the MMU notifier
  callback, take read locks on caches to check for a match; only taking a
  write lock to actually perform an invalidation (after a another check).

Is it too much hand-holding, and bordering on stating the obvious?  Maybe.  But
(a) a lot of people that read mailing lists and KVM code are *not* kernel experts,
and (b) a changelog is written _once_, and read hundreds if not thousands of times.

If we can save each reader even a few seconds, then taking an extra minute or two
to write a more verbose changelog is a net win.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-02-07  4:47       ` Sean Christopherson
@ 2024-02-07  4:59         ` David Woodhouse
  2024-02-07 15:10           ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2024-02-07  4:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paul Durrant, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]

On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote:
> 
> I'm saying this:
> 
>   When processing mmu_notifier invalidations for gpc caches, pre-check for
>   overlap with the invalidation event while holding gpc->lock for read, and
>   only take gpc->lock for write if the cache needs to be invalidated.  Doing
>   a pre-check without taking gpc->lock for write avoids unnecessarily
>   contending the lock for unrelated invalidations, which is very beneficial
>   for caches that are heavily used (but rarely subjected to mmu_notifier
>   invalidations).
> 
> is much friendlier to readers than this:
> 
>   Taking a write lock on a pfncache will be disruptive if the cache is
>   heavily used (which only requires a read lock). Hence, in the MMU notifier
>   callback, take read locks on caches to check for a match; only taking a
>   write lock to actually perform an invalidation (after a another check).

That's a somewhat subjective observation. I actually find the latter to
be far more succinct and obvious.

Actually... maybe I find yours harder because it isn't actually stating
the situation as I understand it. You said "unrelated invalidation" in
your first email, and "overlap with the invalidation event" in this
one... neither of which makes sense to me because there is no *other*
invalidation here.

We're only talking about the MMU notifier gratuitously taking the write
lock on a GPC that it *isn't* going to invalidate (the common case),
and that disrupting users which are trying to take the read lock on
that GPC.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper
  2024-02-07  3:20   ` Sean Christopherson
@ 2024-02-07  8:47     ` Paul Durrant
  0 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-07  8:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 07/02/2024 03:20, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 7e7fd25b09b3..f3bb9e0a81fe 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1399,6 +1399,17 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
>>    */
>>   void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>>   
>> +/**
>> + * kvm_gpc_mark_dirty - mark a cached page as dirty.
>> + *
>> + * @gpc:	   struct gfn_to_pfn_cache object.
>> + */
>> +static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
> 
> Any objection to kvm_gpc_mark_dirty_in_slot()?  I want to make it clear this only
> marks the gfn dirty in the memslot, i.e. that it doesn't mark the underlying page
> as dirty.

Ok, that sounds fair.

   Paul

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held
  2024-02-07  3:17   ` Sean Christopherson
  2024-02-07  3:26     ` David Woodhouse
@ 2024-02-07  8:48     ` Paul Durrant
  1 sibling, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-07  8:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 07/02/2024 03:17, Sean Christopherson wrote:
> KVM: x86/xen: for the scope please.  A few commits have "KVM: xen:", but "x86/xen"
> is the overwhelming favorite.

If I have to re-post anyway then I can do that.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-07  4:10   ` Sean Christopherson
@ 2024-02-07  8:53     ` Paul Durrant
  2024-02-08  8:52     ` Paul Durrant
  1 sibling, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-07  8:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 07/02/2024 04:10, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
>> @@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>   		}
>>   		break;
>>   
>> -	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
>>   		int idx;
>>   
>>   		mutex_lock(&kvm->arch.xen.xen_lock);
>>   
>>   		idx = srcu_read_lock(&kvm->srcu);
>>   
>> -		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>> -			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>> -			r = 0;
>> +		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
>> +			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>> +				r = 0;
>> +			} else {
>> +				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>> +						     gfn_to_gpa(data->u.shared_info.gfn),
>> +						     PAGE_SIZE);
>> +			}
>>   		} else {
>> -			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>> -					     gfn_to_gpa(data->u.shared_info.gfn),
>> -					     PAGE_SIZE);
>> +			if (data->u.shared_info.hva == 0) {
> 
> I know I said I don't care about the KVM Xen ABI, but I still think using '0' as
> "invalid" is ridiculous.
> 

I don't have any massive preference; we could define a 
KVM_XEN_INVALID_HVA too.

> More importantly, this code needs to check that HVA is a userspace pointer.
> Because __kvm_set_memory_region() performs the address checks, KVM assumes any
> hva that it gets out of a memslot, i.e. from a gfn, is a safe userspace address.
> 
> kvm_is_error_hva() will catch most addresses, but I'm pretty sure there's still
> a small window where userspace could use this to write kernel memory.

Ok. Good point. I'll add some appropriate checks.

   Paul

> 
>> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>> +				r = 0;
>> +			} else {
>> +				r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache,
>> +							 data->u.shared_info.hva,
>> +							 PAGE_SIZE);
>> +			}


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN
  2024-02-07  4:14   ` Sean Christopherson
@ 2024-02-07  8:54     ` Paul Durrant
  2024-02-07 14:58       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-02-07  8:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 07/02/2024 04:14, Sean Christopherson wrote:
> Please stop making up random scopes.  Yes, I know "KVM: selftests:" is too coarse,
> bt everyone doing their own thing is worse.

So what would you suggest?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN
  2024-02-07  8:54     ` Paul Durrant
@ 2024-02-07 14:58       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07 14:58 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Wed, Feb 07, 2024, Paul Durrant wrote:
> On 07/02/2024 04:14, Sean Christopherson wrote:
> > Please stop making up random scopes.  Yes, I know "KVM: selftests:" is too coarse,
> > bt everyone doing their own thing is worse.
> 
> So what would you suggest?

Until someone comes up with a better idea that at least a majority of developers
agree on, maintain the status quo and use "KVM: selftests:" for the scope, and
call out and/or allude to the relevant test(s) in the shortlog.  E.g. you can
even squeeze in both:

  KVM: selftests: Map Xen's shared_info page by HVA, not GPA in xen_shinfo_test

I know it's kludgy and silly, but it's consistent.

Aside from consistency, the "problem" with selftests is that most changes are
either specific to one test, or affect multiple tests that have no common
denominator beyond KVM.  And for changes that are targeted at a single test, I
find it helpful if the shortlog specifies the exact test that's being changed.

To be clear, I am definitely open to ideas if people feel "KVM: selftests" isn't
working, I just want us to make a decision as a group and commit to it, as opposed
to people using whatever scope suits their fancy.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first
  2024-02-07  4:59         ` David Woodhouse
@ 2024-02-07 15:10           ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07 15:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paul Durrant, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest

On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 20:47 -0800, Sean Christopherson wrote:
> > 
> > I'm saying this:
> > 
> >   When processing mmu_notifier invalidations for gpc caches, pre-check for
> >   overlap with the invalidation event while holding gpc->lock for read, and
> >   only take gpc->lock for write if the cache needs to be invalidated.  Doing
> >   a pre-check without taking gpc->lock for write avoids unnecessarily
> >   contending the lock for unrelated invalidations, which is very beneficial
> >   for caches that are heavily used (but rarely subjected to mmu_notifier
> >   invalidations).
> > 
> > is much friendlier to readers than this:
> > 
> >   Taking a write lock on a pfncache will be disruptive if the cache is
> >   heavily used (which only requires a read lock). Hence, in the MMU notifier
> >   callback, take read locks on caches to check for a match; only taking a
> >   write lock to actually perform an invalidation (after a another check).
> 
> That's a somewhat subjective observation. I actually find the latter to
> be far more succinct and obvious.
> 
> Actually... maybe I find yours harder because it isn't actually stating
> the situation as I understand it. You said "unrelated invalidation" in
> your first email, and "overlap with the invalidation event" in this
> one... neither of which makes sense to me because there is no *other*
> invalidation here.

I am referring to the "mmu_notifier invalidation event".  While a particular GPC
may not be affected by the invalidation, it's entirely possible that a different
GPC and/or some chunk of guest memory does need to be invalidated/zapped.

> We're only talking about the MMU notifier gratuitously taking the write

It's not "the MMU notifier" though, it's KVM that unnecessarily takes a lock.  I
know I'm being somewhat pedantic, but the distinction does matter.  E.g. with
guest_memfd, there will be invalidations that get routed through this code, but
that do not originate in the mmu_notifier.

And I think it's important to make it clear to readers that an mmu_notifier really
just is a notification from the primary MMU, albeit a notification that comes with
a rather strict contract.

> lock on a GPC that it *isn't* going to invalidate (the common case),
> and that disrupting users which are trying to take the read lock on
> that GPC.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held
  2024-02-07  3:26     ` David Woodhouse
@ 2024-02-07 15:15       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-07 15:15 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paul Durrant, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest

On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 19:17 -0800, Sean Christopherson wrote:
> > KVM: x86/xen: for the scope please.  A few commits have "KVM: xen:", but "x86/xen"
> > is the overwhelming favorite.
> 
> Paul's been using "KVM: xen:" in this patch series since first posting
> it in September of last year. If there aren't more substantial changes
> you need, would you perhaps be able to make that minor fixup as you
> apply the series?

Yes, I can fixup scopes when applying, though I think in this case there's just
enough small changes that another version would be helpful.  Tweaks to the scope
are rarely grounds for needing a new version.  I'd say "never", but then someone
would inevitably prove me wrong :-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-07  4:10   ` Sean Christopherson
  2024-02-07  8:53     ` Paul Durrant
@ 2024-02-08  8:52     ` Paul Durrant
  2024-02-08 16:48       ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-02-08  8:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 07/02/2024 04:10, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
>> @@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>   		}
>>   		break;
>>   
>> -	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
>>   		int idx;
>>   
>>   		mutex_lock(&kvm->arch.xen.xen_lock);
>>   
>>   		idx = srcu_read_lock(&kvm->srcu);
>>   
>> -		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>> -			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>> -			r = 0;
>> +		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
>> +			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>> +				r = 0;
>> +			} else {
>> +				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>> +						     gfn_to_gpa(data->u.shared_info.gfn),
>> +						     PAGE_SIZE);
>> +			}
>>   		} else {
>> -			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>> -					     gfn_to_gpa(data->u.shared_info.gfn),
>> -					     PAGE_SIZE);
>> +			if (data->u.shared_info.hva == 0) {
> 
> I know I said I don't care about the KVM Xen ABI, but I still think using '0' as
> "invalid" is ridiculous.
> 

With the benefit of some sleep, I'm wondering why 0 is a 'ridiculous' 
invalid value for a *virtual* address? Surely it's essentially a 
numerical cast of the canonically invalid NULL pointer?

   Paul

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-08  8:52     ` Paul Durrant
@ 2024-02-08 16:48       ` Sean Christopherson
  2024-02-08 16:51         ` Paul Durrant
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-08 16:48 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Thu, Feb 08, 2024, Paul Durrant wrote:
> On 07/02/2024 04:10, Sean Christopherson wrote:
> > On Mon, Jan 15, 2024, Paul Durrant wrote:
> > > @@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
> > >   		}
> > >   		break;
> > > -	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
> > > +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
> > > +	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
> > >   		int idx;
> > >   		mutex_lock(&kvm->arch.xen.xen_lock);
> > >   		idx = srcu_read_lock(&kvm->srcu);
> > > -		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
> > > -			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
> > > -			r = 0;
> > > +		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
> > > +			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
> > > +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
> > > +				r = 0;
> > > +			} else {
> > > +				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
> > > +						     gfn_to_gpa(data->u.shared_info.gfn),
> > > +						     PAGE_SIZE);
> > > +			}
> > >   		} else {
> > > -			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
> > > -					     gfn_to_gpa(data->u.shared_info.gfn),
> > > -					     PAGE_SIZE);
> > > +			if (data->u.shared_info.hva == 0) {
> > 
> > I know I said I don't care about the KVM Xen ABI, but I still think using '0' as
> > "invalid" is ridiculous.
> > 
> 
> With the benefit of some sleep, I'm wondering why 0 is a 'ridiculous'
> invalid value for a *virtual* address? Surely it's essentially a numerical
> cast of the canonically invalid NULL pointer?

It's legal to mmap() virtual address '0', albeit not by default:

  config DEFAULT_MMAP_MIN_ADDR
	int "Low address space to protect from user allocation"
	depends on MMU
	default 4096
	help
	  This is the portion of low virtual memory which should be protected
	  from userspace allocation.  Keeping a user from writing to low pages
	  can help reduce the impact of kernel NULL pointer bugs.

	  For most ppc64 and x86 users with lots of address space
	  a value of 65536 is reasonable and should cause no problems.
	  On arm and other archs it should not be higher than 32768.
	  Programs which use vm86 functionality or have some need to map
	  this low address space will need CAP_SYS_RAWIO or disable this
	  protection by setting the value to 0.

	  This value can be changed after boot using the
	  /proc/sys/vm/mmap_min_addr tunable.


Obviously it's equally ridiculous that userspace would ever mmap() '0' and pass
that as the shared_info, but given that this is x86-only, there are architecturally
illegal addresses that can be used, at least until Intel adds LA64 ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-08 16:48       ` Sean Christopherson
@ 2024-02-08 16:51         ` Paul Durrant
  2024-02-08 17:26           ` David Woodhouse
  0 siblings, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-02-08 16:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 08/02/2024 16:48, Sean Christopherson wrote:
> On Thu, Feb 08, 2024, Paul Durrant wrote:
>> On 07/02/2024 04:10, Sean Christopherson wrote:
>>> On Mon, Jan 15, 2024, Paul Durrant wrote:
>>>> @@ -638,20 +637,32 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>>>>    		}
>>>>    		break;
>>>> -	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
>>>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>>>> +	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
>>>>    		int idx;
>>>>    		mutex_lock(&kvm->arch.xen.xen_lock);
>>>>    		idx = srcu_read_lock(&kvm->srcu);
>>>> -		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>>>> -			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>>>> -			r = 0;
>>>> +		if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
>>>> +			if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
>>>> +				kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
>>>> +				r = 0;
>>>> +			} else {
>>>> +				r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>>>> +						     gfn_to_gpa(data->u.shared_info.gfn),
>>>> +						     PAGE_SIZE);
>>>> +			}
>>>>    		} else {
>>>> -			r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
>>>> -					     gfn_to_gpa(data->u.shared_info.gfn),
>>>> -					     PAGE_SIZE);
>>>> +			if (data->u.shared_info.hva == 0) {
>>>
>>> I know I said I don't care about the KVM Xen ABI, but I still think using '0' as
>>> "invalid" is ridiculous.
>>>
>>
>> With the benefit of some sleep, I'm wondering why 0 is a 'ridiculous'
>> invalid value for a *virtual* address? Surely it's essentially a numerical
>> cast of the canonically invalid NULL pointer?
> 
> It's legal to mmap() virtual address '0', albeit not by default:
> 
>    config DEFAULT_MMAP_MIN_ADDR
> 	int "Low address space to protect from user allocation"
> 	depends on MMU
> 	default 4096
> 	help
> 	  This is the portion of low virtual memory which should be protected
> 	  from userspace allocation.  Keeping a user from writing to low pages
> 	  can help reduce the impact of kernel NULL pointer bugs.
> 
> 	  For most ppc64 and x86 users with lots of address space
> 	  a value of 65536 is reasonable and should cause no problems.
> 	  On arm and other archs it should not be higher than 32768.
> 	  Programs which use vm86 functionality or have some need to map
> 	  this low address space will need CAP_SYS_RAWIO or disable this
> 	  protection by setting the value to 0.
> 
> 	  This value can be changed after boot using the
> 	  /proc/sys/vm/mmap_min_addr tunable.
> 
> 
> Obviously it's equally ridiculous that userspace would ever mmap() '0' and pass
> that as the shared_info, but given that this is x86-only, there are architecturally
> illegal addresses that can be used, at least until Intel adds LA64 ;-)

Ok. Thanks for the reference.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-08 16:51         ` Paul Durrant
@ 2024-02-08 17:26           ` David Woodhouse
  2024-02-09 16:01             ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: David Woodhouse @ 2024-02-08 17:26 UTC (permalink / raw)
  To: paul, Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1096 bytes --]

On Thu, 2024-02-08 at 16:51 +0000, Paul Durrant wrote:
> On 08/02/2024 16:48, Sean Christopherson wrote:
> > On Thu, Feb 08, 2024, Paul Durrant wrote:
> > > With the benefit of some sleep, I'm wondering why 0 is a 'ridiculous'
> > > invalid value for a *virtual* address? Surely it's essentially a numerical
> > > cast of the canonically invalid NULL pointer?
> > 
> > It's legal to mmap() virtual address '0', albeit not by default:

Well yes, to make dosemu work. But if you attempt to actually *do* that
in C code, the compiler itself doesn't cope...

$ cat foo.c
int foo(int *bar)
{
    if (bar)
        return 0;
    return *bar;
}
$ gcc -O2 -S -o- foo.c
...
foo:
.LFB0:
	.cfi_startproc
	endbr64
	testq	%rdi, %rdi
	je	.L4
	xorl	%eax, %eax
	ret
	.p2align 4,,10
	.p2align 3
.L4:
	movl	0, %eax
	ud2
	.cfi_endproc
.LFE0:
	.size	foo, .-foo

Note the ud2 instead of actually trying to dereference it.

Using anything except NULL as the "no value" value doesn't make sense
to me. It violates the principle of least surprise and would be a
really bad API.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper
  2024-01-15 12:56 ` [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper Paul Durrant
  2024-02-07  3:20   ` Sean Christopherson
@ 2024-02-09 15:58   ` Sean Christopherson
  2024-02-09 16:05     ` Paul Durrant
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-09 15:58 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On Mon, Jan 15, 2024, Paul Durrant wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..f3bb9e0a81fe 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1399,6 +1399,17 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
>   */
>  void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>  
> +/**
> + * kvm_gpc_mark_dirty - mark a cached page as dirty.
> + *
> + * @gpc:	   struct gfn_to_pfn_cache object.
> + */
> +static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
> +{
> +	lockdep_assert_held(&gpc->lock);
> +	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);

Can you opportunistically have this pre-check gpc->memslot?  __kvm_gpc_refresh()
should nullify gpc->memslot when using an hva.  That way, you don't need to
explicitly check for the "invalid gfn" case here (or you could, but WARN_ON_ONCE()
if the memslot is non-NULL and the gfn is invalid?).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA
  2024-02-08 17:26           ` David Woodhouse
@ 2024-02-09 16:01             ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2024-02-09 16:01 UTC (permalink / raw)
  To: David Woodhouse
  Cc: paul, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest

On Thu, Feb 08, 2024, David Woodhouse wrote:
> Using anything except NULL as the "no value" value doesn't make sense
> to me. It violates the principle of least surprise and would be a
> really bad API.

I'm a-ok with using '0'.  My only request is to check for "!hva" as opposed to
"hva == 0", both because that's preferred kernel style, and because it better
conveys that it's really checking for !NULL as opposed to address '0'.

Side topic, I think the code will end up in a more readable state if the GFN vs.
HVA sub-commands are handled in separate case statements, especially if/when
xen_lock goes away.  E.g. something like this:

	case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
		int idx;

		if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
			r = 0;
			break;
		}

		idx = srcu_read_lock(&kvm->srcu);
		r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
				     gfn_to_gpa(data->u.shared_info.gfn), PAGE_SIZE);
		if (!r && kvm->arch.xen.shinfo_cache.active)
			r = kvm_xen_shared_info_init(kvm);
		srcu_read_unlock(&kvm->srcu, idx);
		break;
	}

	case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
		unsigned long hva = data->u.shared_info.hva;

		if (hva != untagged_addr(hva) || !access_ok((void __user *)hva) ||
		    !PAGE_ALIGNED(hva)) {
			r = -EINVAL;
			break;
		}

		if (!hva) {
			kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
			r = 0;
			break;
		}
		r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache, hva, PAGE_SIZE);
		if (!r && kvm->arch.xen.shinfo_cache.active)
			r = kvm_xen_shared_info_init(kvm);
		break;
	}

Side topic #2, the above requires that __kvm_gpc_refresh() not grab kvm_memslots()
in the "using an hva" path, but I think that'd actually be a good thing as it
would make it a bit more clear that using an hva bypasses memslots by design.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper
  2024-02-09 15:58   ` Sean Christopherson
@ 2024-02-09 16:05     ` Paul Durrant
  0 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-09 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest

On 09/02/2024 15:58, Sean Christopherson wrote:
> On Mon, Jan 15, 2024, Paul Durrant wrote:
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 7e7fd25b09b3..f3bb9e0a81fe 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1399,6 +1399,17 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
>>    */
>>   void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>>   
>> +/**
>> + * kvm_gpc_mark_dirty - mark a cached page as dirty.
>> + *
>> + * @gpc:	   struct gfn_to_pfn_cache object.
>> + */
>> +static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
>> +{
>> +	lockdep_assert_held(&gpc->lock);
>> +	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
> 
> Can you opportunistically have this pre-check gpc->memslot?  __kvm_gpc_refresh()
> should nullify gpc->memslot when using an hva.  That way, you don't need to
> explicitly check for the "invalid gfn" case here (or you could, but WARN_ON_ONCE()
> if the memslot is non-NULL and the gfn is invalid?).

Ok, sure.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-07  4:03   ` Sean Christopherson
  2024-02-07  4:13     ` David Woodhouse
@ 2024-02-14 15:21     ` Paul Durrant
  2024-02-14 16:20       ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Paul Durrant @ 2024-02-14 15:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand

On 07/02/2024 04:03, Sean Christopherson wrote:
> +s390 folks (question on kvm_is_error_gpa() for ya)
> 
> On Mon, Jan 15, 2024, Paul Durrant wrote:
>> @@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>>   static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
>>   {
>>   	lockdep_assert_held(&gpc->lock);
>> -	mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
>> +
>> +	if (gpc->gpa != KVM_XEN_INVALID_GPA)
> 
> KVM_XEN_INVALID_GPA absolutely doesn't belong in common code.  Not to mention
> that it will break when Paolo (rightly) moves it to an x86 header.
> 
> https://lore.kernel.org/all/20240131233056.10845-3-pbonzini@redhat.com
> 
>> +		mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
>>   }
>>   
>>   void kvm_sigset_activate(struct kvm_vcpu *vcpu);
>> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
>> index 97eec8ee3449..ae822bff812f 100644
>> --- a/virt/kvm/pfncache.c
>> +++ b/virt/kvm/pfncache.c
>> @@ -48,7 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
>>   	if (!gpc->active)
>>   		return false;
>>   
>> -	if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
>> +	if (gpc->gpa != KVM_XEN_INVALID_GPA && gpc->generation != slots->generation)
> 
> This needs a comment.  I know what it's doing, but it wasn't obvious at first
> glance, and it definitely won't be intuitive for readers that aren't intimately
> familiar with memslots.
> 
>> +		return false;
>> +
>> +	if (kvm_is_error_hva(gpc->uhva))
>>   		return false;
>>   
>>   	if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
>> @@ -209,11 +212,13 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
>>   	return -EFAULT;
>>   }
>>   
>> -static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>> +static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
>>   			     unsigned long len)
>>   {
>>   	struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
>> -	unsigned long page_offset = offset_in_page(gpa);
>> +	unsigned long page_offset = (gpa != KVM_XEN_INVALID_GPA) ?
>> +		offset_in_page(gpa) :
>> +		offset_in_page(uhva);
> 
> This formatting is funky.  I also think it would be worth adding a helper to pair
> with kvm_is_error_hva().
> 
> But!  kvm_is_error_gpa() already exists, and it very, very sneakily does a memslot
> lookup and checks for a valid HVA.
> 
> s390 people, any objection to renaming kvm_is_error_gpa() to something like
> kvm_gpa_has_memslot() or kvm_gpa_is_in_memslot()?  s390 is the only code that
> uses the existing helper.
> 
> That would both to free up the name to pair with kvm_is_error_hva(), and would
> make it obvious what the helper does; I was quite surprised that "error" means
> "is covered by a valid memslot".
> 

Seemingly no response to this; I'll define a local helper rather than 
re-working the open-coded tests to check against INVALID_GPA. This can 
then be trivially replaced if need be.

> Back to this code, then we can have a slightly cleaner:
> 
> 	unsigned long page_offset = kvm_is_error_gpa(gpa) ? offset_in_page(gpa) :
> 							    offset_in_page(uhva);
> 
> 
> And I think it's worth asserting that exactly _one_ of GPA or HVA is valid, e.g.
> to ensure KVM doesn't end up with botched offsets, and to make it a bit more
> clear what's going on.
> 
> 
> 	if (WARN_ON_ONCE(kvm_is_error_gpa(gpa) == kvm_is_error_hva(uhva))
> 		return -EINVAL;

Sure.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-07  4:13     ` David Woodhouse
@ 2024-02-14 16:01       ` Sean Christopherson
  2024-02-14 16:09         ` Paul Durrant
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-14 16:01 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paul Durrant, Paolo Bonzini, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Shuah Khan, kvm, linux-doc, linux-kernel, linux-kselftest,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand

On Tue, Feb 06, 2024, David Woodhouse wrote:
> On Tue, 2024-02-06 at 20:03 -0800, Sean Christopherson wrote:
> > +s390 folks (question on kvm_is_error_gpa() for ya)
> > 
> > On Mon, Jan 15, 2024, Paul Durrant wrote:
> > > @@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
> > >   static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
> > >   {
> > >         lockdep_assert_held(&gpc->lock);
> > > -       mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
> > > +
> > > +       if (gpc->gpa != KVM_XEN_INVALID_GPA)
> > 
> > KVM_XEN_INVALID_GPA absolutely doesn't belong in common code.  Not to mention
> > that it will break when Paolo (rightly) moves it to an x86 header.
> > 
> > https://lore.kernel.org/all/20240131233056.10845-3-pbonzini@redhat.com
> 
> We can use plain INVALID_GPA for that, I think. ISTR the reason we have
> a separate KVM_XEN_INVALID_GPA is because that's a userspace API.
> 
> ...
> 
> > But!  kvm_is_error_gpa() already exists, and it very, very sneakily
> > does a memslot lookup and checks for a valid HVA.
> 
> Hm, that doesn't sound as fast as simple comparison. We also can't do
> it from kvm_gpc_check(), can we?

You snipped the part where I suggested renaming the existing kvm_is_error_gpa().

I am suggesting we do the below (and obviously rename the s390 usage, too), and
then the gpc code can use use kvm_is_error_gpa().

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bbfefd7e612f..e1df988e4d57 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -148,6 +148,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 
 #endif
 
+static inline bool kvm_is_error_gpa(gpa_t gpa)
+{
+       return gpa == INVALID_GPA;
+}
+
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
 
 static inline bool is_error_page(struct page *page)
@@ -1787,7 +1792,7 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
        return (hpa_t)pfn << PAGE_SHIFT;
 }
 
-static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
+static inline bool kvm_gpa_is_in_memslot(struct kvm *kvm, gpa_t gpa)
 {
        unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
 

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-14 16:01       ` Sean Christopherson
@ 2024-02-14 16:09         ` Paul Durrant
  0 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-14 16:09 UTC (permalink / raw)
  To: Sean Christopherson, David Woodhouse
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Shuah Khan,
	kvm, linux-doc, linux-kernel, linux-kselftest,
	Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
	David Hildenbrand

On 14/02/2024 16:01, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, David Woodhouse wrote:
>> On Tue, 2024-02-06 at 20:03 -0800, Sean Christopherson wrote:
>>> +s390 folks (question on kvm_is_error_gpa() for ya)
>>>
>>> On Mon, Jan 15, 2024, Paul Durrant wrote:
>>>> @@ -1398,7 +1414,9 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
>>>>    static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
>>>>    {
>>>>          lockdep_assert_held(&gpc->lock);
>>>> -       mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
>>>> +
>>>> +       if (gpc->gpa != KVM_XEN_INVALID_GPA)
>>>
>>> KVM_XEN_INVALID_GPA absolutely doesn't belong in common code.  Not to mention
>>> that it will break when Paolo (rightly) moves it to an x86 header.
>>>
>>> https://lore.kernel.org/all/20240131233056.10845-3-pbonzini@redhat.com
>>
>> We can use plain INVALID_GPA for that, I think. ISTR the reason we have
>> a separate KVM_XEN_INVALID_GPA is because that's a userspace API.
>>
>> ...
>>
>>> But!  kvm_is_error_gpa() already exists, and it very, very sneakily
>>> does a memslot lookup and checks for a valid HVA.
>>
>> Hm, that doesn't sound as fast as simple comparison. We also can't do
>> it from kvm_gpc_check(), can we?
> 
> You snipped the part where I suggested renaming the existing kvm_is_error_gpa().
> 
> I am suggesting we do the below (and obviously rename the s390 usage, too), and
> then the gpc code can use use kvm_is_error_gpa().
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index bbfefd7e612f..e1df988e4d57 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -148,6 +148,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>   
>   #endif
>   
> +static inline bool kvm_is_error_gpa(gpa_t gpa)
> +{
> +       return gpa == INVALID_GPA;
> +}
> +

Are you ok with a local kvm_gpc_is_error_gpa() or somesuch until there 
is agreement with s390?

   Paul

>   #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))
>   
>   static inline bool is_error_page(struct page *page)
> @@ -1787,7 +1792,7 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
>          return (hpa_t)pfn << PAGE_SHIFT;
>   }
>   
> -static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
> +static inline bool kvm_gpa_is_in_memslot(struct kvm *kvm, gpa_t gpa)
>   {
>          unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
>   


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-14 15:21     ` Paul Durrant
@ 2024-02-14 16:20       ` Sean Christopherson
  2024-02-14 16:33         ` Paul Durrant
  0 siblings, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2024-02-14 16:20 UTC (permalink / raw)
  To: paul
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand

On Wed, Feb 14, 2024, Paul Durrant wrote:
> On 07/02/2024 04:03, Sean Christopherson wrote:
> > +s390 folks (question on kvm_is_error_gpa() for ya)
> > But!  kvm_is_error_gpa() already exists, and it very, very sneakily does a memslot
> > lookup and checks for a valid HVA.
> > 
> > s390 people, any objection to renaming kvm_is_error_gpa() to something like
> > kvm_gpa_has_memslot() or kvm_gpa_is_in_memslot()?  s390 is the only code that
> > uses the existing helper.
> > 
> > That would both to free up the name to pair with kvm_is_error_hva(), and would
> > make it obvious what the helper does; I was quite surprised that "error" means
> > "is covered by a valid memslot".
> > 
> 
> Seemingly no response to this; I'll define a local helper rather than
> re-working the open-coded tests to check against INVALID_GPA. This can then
> be trivially replaced if need be.

How about we force a decision with a patch?  This should be easy enough to slot
in, and I would be quite surprised if s390 is overly attached to kvm_is_error_gpa().

From: Sean Christopherson <seanjc@google.com>
Date: Wed, 14 Feb 2024 08:05:49 -0800
Subject: [PATCH] KVM: s390: Refactor kvm_is_error_gpa() into
 kvm_is_gpa_in_memslot()

Rename kvm_is_error_gpa() to kvm_is_gpa_in_memslot() and invert the
polarity accordingly in order to (a) free up kvm_is_error_gpa() to match
with kvm_is_error_{hva,page}(), and (b) to make it more obvious that the
helper is doing a memslot lookup, i.e. not simply checking for INVALID_GPA.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/s390/kvm/diag.c     |  2 +-
 arch/s390/kvm/gaccess.c  | 14 +++++++-------
 arch/s390/kvm/kvm-s390.c |  4 ++--
 arch/s390/kvm/priv.c     |  4 ++--
 arch/s390/kvm/sigp.c     |  2 +-
 include/linux/kvm_host.h |  4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 3c65b8258ae6..2a32438e09ce 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -102,7 +102,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
 		    parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
 			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-		if (kvm_is_error_gpa(vcpu->kvm, parm.token_addr))
+		if (!kvm_is_gpa_in_memslot(vcpu->kvm, parm.token_addr))
 			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 
 		vcpu->arch.pfault_token = parm.token_addr;
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 5bfcc50c1a68..415c99649e43 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -664,7 +664,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 	case ASCE_TYPE_REGION1:	{
 		union region1_table_entry rfte;
 
-		if (kvm_is_error_gpa(vcpu->kvm, ptr))
+		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
 			return PGM_ADDRESSING;
 		if (deref_table(vcpu->kvm, ptr, &rfte.val))
 			return -EFAULT;
@@ -682,7 +682,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 	case ASCE_TYPE_REGION2: {
 		union region2_table_entry rste;
 
-		if (kvm_is_error_gpa(vcpu->kvm, ptr))
+		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
 			return PGM_ADDRESSING;
 		if (deref_table(vcpu->kvm, ptr, &rste.val))
 			return -EFAULT;
@@ -700,7 +700,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 	case ASCE_TYPE_REGION3: {
 		union region3_table_entry rtte;
 
-		if (kvm_is_error_gpa(vcpu->kvm, ptr))
+		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
 			return PGM_ADDRESSING;
 		if (deref_table(vcpu->kvm, ptr, &rtte.val))
 			return -EFAULT;
@@ -728,7 +728,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 	case ASCE_TYPE_SEGMENT: {
 		union segment_table_entry ste;
 
-		if (kvm_is_error_gpa(vcpu->kvm, ptr))
+		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
 			return PGM_ADDRESSING;
 		if (deref_table(vcpu->kvm, ptr, &ste.val))
 			return -EFAULT;
@@ -748,7 +748,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 		ptr = ste.fc0.pto * (PAGE_SIZE / 2) + vaddr.px * 8;
 	}
 	}
-	if (kvm_is_error_gpa(vcpu->kvm, ptr))
+	if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
 		return PGM_ADDRESSING;
 	if (deref_table(vcpu->kvm, ptr, &pte.val))
 		return -EFAULT;
@@ -770,7 +770,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
 		*prot = PROT_TYPE_IEP;
 		return PGM_PROTECTION;
 	}
-	if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
+	if (!kvm_is_gpa_in_memslot(vcpu->kvm, raddr.addr))
 		return PGM_ADDRESSING;
 	*gpa = raddr.addr;
 	return 0;
@@ -957,7 +957,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
 				return rc;
 		} else {
 			gpa = kvm_s390_real_to_abs(vcpu, ga);
-			if (kvm_is_error_gpa(vcpu->kvm, gpa)) {
+			if (!kvm_is_gpa_in_memslot(vcpu->kvm, gpa)) {
 				rc = PGM_ADDRESSING;
 				prot = PROT_NONE;
 			}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ea63ac769889..3e5a1d7aa81a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2878,7 +2878,7 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op *mop)
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 
-	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
+	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
 		r = PGM_ADDRESSING;
 		goto out_unlock;
 	}
@@ -2940,7 +2940,7 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem_op *m
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 
-	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
+	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
 		r = PGM_ADDRESSING;
 		goto out_unlock;
 	}
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f875a404a0a0..1be19cc9d73c 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -149,7 +149,7 @@ static int handle_set_prefix(struct kvm_vcpu *vcpu)
 	 * first page, since address is 8k aligned and memory pieces are always
 	 * at least 1MB aligned and have at least a size of 1MB.
 	 */
-	if (kvm_is_error_gpa(vcpu->kvm, address))
+	if (!kvm_is_gpa_in_memslot(vcpu->kvm, address))
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 
 	kvm_s390_set_prefix(vcpu, address);
@@ -464,7 +464,7 @@ static int handle_test_block(struct kvm_vcpu *vcpu)
 		return kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm);
 	addr = kvm_s390_real_to_abs(vcpu, addr);
 
-	if (kvm_is_error_gpa(vcpu->kvm, addr))
+	if (!kvm_is_gpa_in_memslot(vcpu->kvm, addr))
 		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 	/*
 	 * We don't expect errors on modern systems, and do not care
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index d9696b530064..55c34cb35428 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -172,7 +172,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, struct kvm_vcpu *dst_vcpu,
 	 * first page, since address is 8k aligned and memory pieces are always
 	 * at least 1MB aligned and have at least a size of 1MB.
 	 */
-	if (kvm_is_error_gpa(vcpu->kvm, irq.u.prefix.address)) {
+	if (!kvm_is_gpa_in_memslot(vcpu->kvm, irq.u.prefix.address)) {
 		*reg &= 0xffffffff00000000UL;
 		*reg |= SIGP_STATUS_INVALID_PARAMETER;
 		return SIGP_CC_STATUS_STORED;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..d175b64488ec 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1788,11 +1788,11 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
 	return (hpa_t)pfn << PAGE_SHIFT;
 }
 
-static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
+static inline bool kvm_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa)
 {
 	unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
 
-	return kvm_is_error_hva(hva);
+	return !kvm_is_error_hva(hva);
 }
 
 enum kvm_stat_kind {

base-commit: 687d8f4c3dea0758afd748968d91288220bbe7e3
-- 


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
  2024-02-14 16:20       ` Sean Christopherson
@ 2024-02-14 16:33         ` Paul Durrant
  0 siblings, 0 replies; 58+ messages in thread
From: Paul Durrant @ 2024-02-14 16:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	David Woodhouse, Shuah Khan, kvm, linux-doc, linux-kernel,
	linux-kselftest, Christian Borntraeger, Janosch Frank,
	Claudio Imbrenda, David Hildenbrand

On 14/02/2024 16:20, Sean Christopherson wrote:
> On Wed, Feb 14, 2024, Paul Durrant wrote:
>> On 07/02/2024 04:03, Sean Christopherson wrote:
>>> +s390 folks (question on kvm_is_error_gpa() for ya)
>>> But!  kvm_is_error_gpa() already exists, and it very, very sneakily does a memslot
>>> lookup and checks for a valid HVA.
>>>
>>> s390 people, any objection to renaming kvm_is_error_gpa() to something like
>>> kvm_gpa_has_memslot() or kvm_gpa_is_in_memslot()?  s390 is the only code that
>>> uses the existing helper.
>>>
>>> That would both to free up the name to pair with kvm_is_error_hva(), and would
>>> make it obvious what the helper does; I was quite surprised that "error" means
>>> "is covered by a valid memslot".
>>>
>>
>> Seemingly no response to this; I'll define a local helper rather than
>> re-working the open-coded tests to check against INVALID_GPA. This can then
>> be trivially replaced if need be.
> 
> How about we force a decision with a patch?  This should be easy enough to slot
> in, and I would be quite surprised if s390 is overly attached to kvm_is_error_gpa().
> 

Ok. I'll slot that in.

> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 14 Feb 2024 08:05:49 -0800
> Subject: [PATCH] KVM: s390: Refactor kvm_is_error_gpa() into
>   kvm_is_gpa_in_memslot()
> 
> Rename kvm_is_error_gpa() to kvm_is_gpa_in_memslot() and invert the
> polarity accordingly in order to (a) free up kvm_is_error_gpa() to match
> with kvm_is_error_{hva,page}(), and (b) to make it more obvious that the
> helper is doing a memslot lookup, i.e. not simply checking for INVALID_GPA.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/s390/kvm/diag.c     |  2 +-
>   arch/s390/kvm/gaccess.c  | 14 +++++++-------
>   arch/s390/kvm/kvm-s390.c |  4 ++--
>   arch/s390/kvm/priv.c     |  4 ++--
>   arch/s390/kvm/sigp.c     |  2 +-
>   include/linux/kvm_host.h |  4 ++--
>   6 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index 3c65b8258ae6..2a32438e09ce 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -102,7 +102,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
>   		    parm.token_addr & 7 || parm.zarch != 0x8000000000000000ULL)
>   			return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>   
> -		if (kvm_is_error_gpa(vcpu->kvm, parm.token_addr))
> +		if (!kvm_is_gpa_in_memslot(vcpu->kvm, parm.token_addr))
>   			return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
>   
>   		vcpu->arch.pfault_token = parm.token_addr;
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 5bfcc50c1a68..415c99649e43 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -664,7 +664,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   	case ASCE_TYPE_REGION1:	{
>   		union region1_table_entry rfte;
>   
> -		if (kvm_is_error_gpa(vcpu->kvm, ptr))
> +		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
>   			return PGM_ADDRESSING;
>   		if (deref_table(vcpu->kvm, ptr, &rfte.val))
>   			return -EFAULT;
> @@ -682,7 +682,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   	case ASCE_TYPE_REGION2: {
>   		union region2_table_entry rste;
>   
> -		if (kvm_is_error_gpa(vcpu->kvm, ptr))
> +		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
>   			return PGM_ADDRESSING;
>   		if (deref_table(vcpu->kvm, ptr, &rste.val))
>   			return -EFAULT;
> @@ -700,7 +700,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   	case ASCE_TYPE_REGION3: {
>   		union region3_table_entry rtte;
>   
> -		if (kvm_is_error_gpa(vcpu->kvm, ptr))
> +		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
>   			return PGM_ADDRESSING;
>   		if (deref_table(vcpu->kvm, ptr, &rtte.val))
>   			return -EFAULT;
> @@ -728,7 +728,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   	case ASCE_TYPE_SEGMENT: {
>   		union segment_table_entry ste;
>   
> -		if (kvm_is_error_gpa(vcpu->kvm, ptr))
> +		if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
>   			return PGM_ADDRESSING;
>   		if (deref_table(vcpu->kvm, ptr, &ste.val))
>   			return -EFAULT;
> @@ -748,7 +748,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   		ptr = ste.fc0.pto * (PAGE_SIZE / 2) + vaddr.px * 8;
>   	}
>   	}
> -	if (kvm_is_error_gpa(vcpu->kvm, ptr))
> +	if (!kvm_is_gpa_in_memslot(vcpu->kvm, ptr))
>   		return PGM_ADDRESSING;
>   	if (deref_table(vcpu->kvm, ptr, &pte.val))
>   		return -EFAULT;
> @@ -770,7 +770,7 @@ static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
>   		*prot = PROT_TYPE_IEP;
>   		return PGM_PROTECTION;
>   	}
> -	if (kvm_is_error_gpa(vcpu->kvm, raddr.addr))
> +	if (!kvm_is_gpa_in_memslot(vcpu->kvm, raddr.addr))
>   		return PGM_ADDRESSING;
>   	*gpa = raddr.addr;
>   	return 0;
> @@ -957,7 +957,7 @@ static int guest_range_to_gpas(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
>   				return rc;
>   		} else {
>   			gpa = kvm_s390_real_to_abs(vcpu, ga);
> -			if (kvm_is_error_gpa(vcpu->kvm, gpa)) {
> +			if (!kvm_is_gpa_in_memslot(vcpu->kvm, gpa)) {
>   				rc = PGM_ADDRESSING;
>   				prot = PROT_NONE;
>   			}
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index ea63ac769889..3e5a1d7aa81a 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2878,7 +2878,7 @@ static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op *mop)
>   
>   	srcu_idx = srcu_read_lock(&kvm->srcu);
>   
> -	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
> +	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
>   		r = PGM_ADDRESSING;
>   		goto out_unlock;
>   	}
> @@ -2940,7 +2940,7 @@ static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem_op *m
>   
>   	srcu_idx = srcu_read_lock(&kvm->srcu);
>   
> -	if (kvm_is_error_gpa(kvm, mop->gaddr)) {
> +	if (!kvm_is_gpa_in_memslot(kvm, mop->gaddr)) {
>   		r = PGM_ADDRESSING;
>   		goto out_unlock;
>   	}
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index f875a404a0a0..1be19cc9d73c 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -149,7 +149,7 @@ static int handle_set_prefix(struct kvm_vcpu *vcpu)
>   	 * first page, since address is 8k aligned and memory pieces are always
>   	 * at least 1MB aligned and have at least a size of 1MB.
>   	 */
> -	if (kvm_is_error_gpa(vcpu->kvm, address))
> +	if (!kvm_is_gpa_in_memslot(vcpu->kvm, address))
>   		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
>   
>   	kvm_s390_set_prefix(vcpu, address);
> @@ -464,7 +464,7 @@ static int handle_test_block(struct kvm_vcpu *vcpu)
>   		return kvm_s390_inject_prog_irq(vcpu, &vcpu->arch.pgm);
>   	addr = kvm_s390_real_to_abs(vcpu, addr);
>   
> -	if (kvm_is_error_gpa(vcpu->kvm, addr))
> +	if (!kvm_is_gpa_in_memslot(vcpu->kvm, addr))
>   		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
>   	/*
>   	 * We don't expect errors on modern systems, and do not care
> diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
> index d9696b530064..55c34cb35428 100644
> --- a/arch/s390/kvm/sigp.c
> +++ b/arch/s390/kvm/sigp.c
> @@ -172,7 +172,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, struct kvm_vcpu *dst_vcpu,
>   	 * first page, since address is 8k aligned and memory pieces are always
>   	 * at least 1MB aligned and have at least a size of 1MB.
>   	 */
> -	if (kvm_is_error_gpa(vcpu->kvm, irq.u.prefix.address)) {
> +	if (!kvm_is_gpa_in_memslot(vcpu->kvm, irq.u.prefix.address)) {
>   		*reg &= 0xffffffff00000000UL;
>   		*reg |= SIGP_STATUS_INVALID_PARAMETER;
>   		return SIGP_CC_STATUS_STORED;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..d175b64488ec 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1788,11 +1788,11 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
>   	return (hpa_t)pfn << PAGE_SHIFT;
>   }
>   
> -static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa)
> +static inline bool kvm_is_gpa_in_memslot(struct kvm *kvm, gpa_t gpa)
>   {
>   	unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa));
>   
> -	return kvm_is_error_hva(hva);
> +	return !kvm_is_error_hva(hva);
>   }
>   
>   enum kvm_stat_kind {
> 
> base-commit: 687d8f4c3dea0758afd748968d91288220bbe7e3


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2024-02-14 16:33 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-15 12:56 [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
2024-01-15 12:56 ` [PATCH v12 01/20] KVM: pfncache: Add a map helper function Paul Durrant
2024-01-15 12:56 ` [PATCH v12 02/20] KVM: pfncache: remove unnecessary exports Paul Durrant
2024-01-15 12:56 ` [PATCH v12 03/20] KVM: xen: mark guest pages dirty with the pfncache lock held Paul Durrant
2024-02-07  3:17   ` Sean Christopherson
2024-02-07  3:26     ` David Woodhouse
2024-02-07 15:15       ` Sean Christopherson
2024-02-07  8:48     ` Paul Durrant
2024-01-15 12:56 ` [PATCH v12 04/20] KVM: pfncache: add a mark-dirty helper Paul Durrant
2024-02-07  3:20   ` Sean Christopherson
2024-02-07  8:47     ` Paul Durrant
2024-02-09 15:58   ` Sean Christopherson
2024-02-09 16:05     ` Paul Durrant
2024-01-15 12:56 ` [PATCH v12 05/20] KVM: pfncache: remove KVM_GUEST_USES_PFN usage Paul Durrant
2024-01-15 12:56 ` [PATCH v12 06/20] KVM: pfncache: stop open-coding offset_in_page() Paul Durrant
2024-01-15 12:56 ` [PATCH v12 07/20] KVM: pfncache: include page offset in uhva and use it consistently Paul Durrant
2024-01-15 12:56 ` [PATCH v12 08/20] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA Paul Durrant
2024-02-07  4:03   ` Sean Christopherson
2024-02-07  4:13     ` David Woodhouse
2024-02-14 16:01       ` Sean Christopherson
2024-02-14 16:09         ` Paul Durrant
2024-02-14 15:21     ` Paul Durrant
2024-02-14 16:20       ` Sean Christopherson
2024-02-14 16:33         ` Paul Durrant
2024-01-15 12:56 ` [PATCH v12 09/20] KVM: xen: separate initialization of shared_info cache and content Paul Durrant
2024-01-15 12:56 ` [PATCH v12 10/20] KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set Paul Durrant
2024-01-15 12:56 ` [PATCH v12 11/20] KVM: xen: allow shared_info to be mapped by fixed HVA Paul Durrant
2024-02-07  4:10   ` Sean Christopherson
2024-02-07  8:53     ` Paul Durrant
2024-02-08  8:52     ` Paul Durrant
2024-02-08 16:48       ` Sean Christopherson
2024-02-08 16:51         ` Paul Durrant
2024-02-08 17:26           ` David Woodhouse
2024-02-09 16:01             ` Sean Christopherson
2024-01-15 12:56 ` [PATCH v12 12/20] KVM: xen: allow vcpu_info " Paul Durrant
2024-01-15 12:57 ` [PATCH v12 13/20] KVM: selftests / xen: map shared_info using HVA rather than GFN Paul Durrant
2024-02-07  4:14   ` Sean Christopherson
2024-02-07  8:54     ` Paul Durrant
2024-02-07 14:58       ` Sean Christopherson
2024-01-15 12:57 ` [PATCH v12 14/20] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA Paul Durrant
2024-01-15 12:57 ` [PATCH v12 15/20] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability Paul Durrant
2024-01-15 12:57 ` [PATCH v12 16/20] KVM: xen: split up kvm_xen_set_evtchn_fast() Paul Durrant
2024-01-15 12:57 ` [PATCH v12 17/20] KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() Paul Durrant
2024-02-07  4:17   ` Sean Christopherson
2024-02-07  4:21     ` David Woodhouse
2024-01-15 12:57 ` [PATCH v12 18/20] KVM: pfncache: check the need for invalidation under read lock first Paul Durrant
2024-02-07  4:22   ` Sean Christopherson
2024-02-07  4:27     ` David Woodhouse
2024-02-07  4:47       ` Sean Christopherson
2024-02-07  4:59         ` David Woodhouse
2024-02-07 15:10           ` Sean Christopherson
2024-01-15 12:57 ` [PATCH v12 19/20] KVM: xen: allow vcpu_info content to be 'safely' copied Paul Durrant
2024-01-15 12:57 ` [PATCH v12 20/20] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Paul Durrant
2024-01-25 15:03 ` [PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling Paul Durrant
2024-01-25 20:07   ` David Woodhouse
2024-01-26  1:19   ` Sean Christopherson
2024-02-02 17:37     ` Paul Durrant
2024-02-02 22:03       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).