[PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals
@ 2022-06-17  1:41 Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Peter Xu @ 2022-06-17  1:41 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton, peterx

[Marked as RFC for now]

One issue was reported that libvirt won't be able to stop the virtual
machine using QMP command "stop" during a paused postcopy migration [1].

It won't work because "stop the VM" operation requires the hypervisor to
kick all the vcpu threads out using SIG_IPI in QEMU (which is translated to
a SIGUSR1).  However since during a paused postcopy, the vcpu threads are
hang death at handle_userfault() so there're simply not responding to the
kicks.  Further, the "stop" command will further hang the QMP channel.

The mm has facility to process generic signal (FAULT_FLAG_INTERRUPTIBLE),
however it's only used in the PF handlers only, not in GUP. Unluckily, KVM
is a heavy GUP user on guest page faults.  It means we won't be able to
interrupt a long page fault for KVM fetching guest pages with what we have
right now.

I think it's reasonable for GUP to only listen to fatal signals, as most of
the GUP users are not really ready to handle such case.  But actually KVM
is not such an user, and KVM actually has rich infrastructure to handle
even generic signals, and properly deliver the signal to the userspace.
Then the page fault can be retried in the next KVM_RUN.

This patchset added FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE,
and let KVM be the first one to use it.

Tests
=====

I created a postcopy environment, pause the migration by shutting down the
network to emulate a network failure (so the handle_userfault() will stuck
for a long time), then I tried three things:

  (1) Sending QMP command "stop" to QEMU monitor,
  (2) Hitting Ctrl-C from QEMU cmdline,
  (3) GDB attach to the dest QEMU process.

Before this patchset, all three use case hang.  After the patchset, all
work just like when there's not network failure at all.

Please have a look, thanks.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1052

Peter Xu (4):
  mm/gup: Add FOLL_INTERRUPTIBLE
  kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  kvm: Add new pfn error KVM_PFN_ERR_INTR
  kvm/x86: Allow to respond to generic signals during slow page faults

 arch/arm64/kvm/mmu.c                   |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
 arch/x86/kvm/mmu/mmu.c                 | 19 ++++++++----
 include/linux/kvm_host.h               | 21 ++++++++++++-
 include/linux/mm.h                     |  1 +
 mm/gup.c                               | 33 ++++++++++++++++++---
 virt/kvm/kvm_main.c                    | 41 ++++++++++++++++----------
 virt/kvm/kvm_mm.h                      |  6 ++--
 virt/kvm/pfncache.c                    |  2 +-
 10 files changed, 104 insertions(+), 34 deletions(-)

-- 
2.32.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-17  1:41 [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
@ 2022-06-17  1:41 ` Peter Xu
  2022-06-21  8:23   ` David Hildenbrand
  2022-06-17  1:41 ` [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2022-06-17  1:41 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton, peterx

We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
issue with it is that not all GUP paths are able to handle signal delivers
besides SIGKILL.

That's not ideal for the GUP users who are actually able to handle these
cases, like KVM.

KVM uses GUP extensively on faulting guest pages, during which we've got
existing infrastructures to retry a page fault at a later time.  Allowing
the GUP to be interrupted by generic signals can make KVM related threads
to be more responsive.  For examples:

  (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
      e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
      generated to kick the vcpus out of kernel context immediately,

  (2) SIGINT: which can be used with interactive hypervisor users to stop a
      virtual machine with Ctrl-C without any delays/hangs,

  (3) SIGTRAP: which grants GDB capability even during page faults that are
      stuck for a long time.

Normally hypervisor will be able to receive these signals properly, but not
if we're stuck in a GUP for a long time for whatever reason.  It happens
easily with a stucked postcopy migration when e.g. a network temp failure
happens, then some vcpu threads can hang death waiting for the pages.  With
the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
enable the ability to trap these signals.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 33 +++++++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..ebdf8a6b86c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..ad74b137d363 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
 		fault_flags |= FAULT_FLAG_WRITE;
 	if (*flags & FOLL_REMOTE)
 		fault_flags |= FAULT_FLAG_REMOTE;
-	if (locked)
+	if (locked) {
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+		/*
+		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
+		 * (at least) killable.  It also mostly means we're not
+		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
+		 * it won't make a lot of sense to be used alone.
+		 */
+		if (*flags & FOLL_INTERRUPTIBLE)
+			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
+	}
 	if (*flags & FOLL_NOWAIT)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
 	if (*flags & FOLL_TRIED) {
@@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
 
+/*
+ * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
+ * specified, it'll also respond to generic signals.  The caller of GUP
+ * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
+ */
+static bool gup_signal_pending(unsigned int flags)
+{
+	if (fatal_signal_pending(current))
+		return true;
+
+	if (!(flags & FOLL_INTERRUPTIBLE))
+		return false;
+
+	return signal_pending(current);
+}
+
 /*
  * Please note that this function, unlike __get_user_pages will not
  * return 0 for nr_pages > 0 without FOLL_NOWAIT
@@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
 		 * Repeat on the address that fired VM_FAULT_RETRY
 		 * with both FAULT_FLAG_ALLOW_RETRY and
 		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
-		 * by fatal signals, so we need to check it before we
+		 * by fatal signals of even common signals, depending on
+		 * the caller's request. So we need to check it before we
 		 * start trying again otherwise it can loop forever.
 		 */
-
-		if (fatal_signal_pending(current)) {
+		if (gup_signal_pending(flags)) {
 			if (!pages_done)
 				pages_done = -EINTR;
 			break;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-17  1:41 [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
@ 2022-06-17  1:41 ` Peter Xu
  2022-06-17 21:53   ` kernel test robot
  2022-06-17  1:41 ` [PATCH RFC 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2022-06-17  1:41 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton, peterx

Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
__gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
for new boolean to be added to __gfn_to_pfn_memslot().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/kvm/mmu.c                   |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
 arch/x86/kvm/mmu/mmu.c                 | 10 +++----
 include/linux/kvm_host.h               |  9 ++++++-
 virt/kvm/kvm_main.c                    | 37 +++++++++++++++-----------
 virt/kvm/kvm_mm.h                      |  6 +++--
 virt/kvm/pfncache.c                    |  2 +-
 8 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..ce8066ded15b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = __gfn_to_pfn_memslot(memslot, gfn,
+				   write_fault ? KVM_GTP_WRITE : 0,
+				   false, NULL, &writable, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 514fd45c1994..2f5fad2e1b7f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = __gfn_to_pfn_memslot(memslot, gfn,
+					   writing ? KVM_GTP_WRITE : 0,
+					   false, NULL, &write_ok, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 42851c32ff3b..232b17c75b83 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = __gfn_to_pfn_memslot(memslot, gfn,
+					   writing ? KVM_GTP_WRITE : 0,
+					   NULL, upgrade_p, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4653688fa6d..e92f1ab63d6a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
+	kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0;
 	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
 
@@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	}
 
 	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
-					  fault->write, &fault->map_writable,
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
+					  &async, &fault->map_writable,
 					  &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
@@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
-					  fault->write, &fault->map_writable,
-					  &fault->hva);
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
+					  &fault->map_writable, &fault->hva);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c20f2d55840c..b646b6fcaec6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
+
+/* gfn_to_pfn (gtp) flags */
+typedef unsigned int __bitwise kvm_gtp_flag_t;
+
+#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
+#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
+
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       kvm_gtp_flag_t gtp_flags, bool *async,
 			       bool *writable, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 64ec2222a196..952400b42ee9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2444,9 +2444,11 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * The slow path to get the pfn of the specified host virtual address,
  * 1 indicates success, -errno is returned if error is detected.
  */
-static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+static int hva_to_pfn_slow(unsigned long addr, bool *async,
+			   kvm_gtp_flag_t gtp_flags,
 			   bool *writable, kvm_pfn_t *pfn)
 {
+	bool write_fault = gtp_flags & KVM_GTP_WRITE;
 	unsigned int flags = FOLL_HWPOISON;
 	struct page *page;
 	int npages = 0;
@@ -2565,20 +2567,22 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 /*
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function can sleep
+ * @gtp_flags: kvm_gtp_flag_t flags (atomic, write, ..)
  * @async: whether this function need to wait IO complete if the
  *         host page is not in the memory
- * @write_fault: whether we should get a writable host page
  * @writable: whether it allows to map a writable host page for !@write_fault
  *
- * The function will map a writable host page for these two cases:
+ * The function will map a writable (KVM_GTP_WRITE set) host page for these
+ * two cases:
  * 1): @write_fault = true
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-		     bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
+		     bool *writable)
 {
+	bool write_fault = gtp_flags & KVM_GTP_WRITE;
+	bool atomic = gtp_flags & KVM_GTP_ATOMIC;
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn = 0;
 	int npages, r;
@@ -2592,7 +2596,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (atomic)
 		return KVM_PFN_ERR_FAULT;
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
 	if (npages == 1)
 		return pfn;
 
@@ -2625,10 +2629,11 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 }
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       kvm_gtp_flag_t gtp_flags, bool *async,
 			       bool *writable, hva_t *hva)
 {
-	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
+	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL,
+					       gtp_flags & KVM_GTP_WRITE);
 
 	if (hva)
 		*hva = addr;
@@ -2651,28 +2656,30 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
-	return hva_to_pfn(addr, atomic, async, write_fault,
-			  writable);
+	return hva_to_pfn(addr, gtp_flags, async, writable);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
-				    write_fault, writable, NULL);
+	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn,
+				    write_fault ? KVM_GTP_WRITE : 0,
+				    NULL, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE,
+				    NULL, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE | KVM_GTP_ATOMIC,
+				    NULL, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 41da467d99c9..1c870911eb48 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -3,6 +3,8 @@
 #ifndef __KVM_MM_H__
 #define __KVM_MM_H__ 1
 
+#include <linux/kvm_host.h>
+
 /*
  * Architectures can choose whether to use an rwlock or spinlock
  * for the mmu_lock.  These macros, for use in common code
@@ -24,8 +26,8 @@
 #define KVM_MMU_READ_UNLOCK(kvm)	spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-		     bool write_fault, bool *writable);
+kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
+		     bool *writable);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
 void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..0f9f6b5d2fbb 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
 		smp_rmb();
 
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(uhva, KVM_GTP_WRITE, NULL, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			break;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR
  2022-06-17  1:41 [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
@ 2022-06-17  1:41 ` Peter Xu
  2022-06-17  1:41 ` [PATCH RFC 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2022-06-17  1:41 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton, peterx

Add one new PFN error type to show when we cannot finish fetching the PFN
due to interruptions.  For example, by receiving a generic signal.

This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
SIGIPI) even during e.g. handling an userfaultfd page fault.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/kvm_host.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b646b6fcaec6..4f84a442f67f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -96,6 +96,7 @@
 #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
+#define KVM_PFN_ERR_INTR	(KVM_PFN_ERR_MASK + 3)
 
 /*
  * error pfns indicate that the gfn is in slot but faild to
@@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn)
 	return !!(pfn & KVM_PFN_ERR_MASK);
 }
 
+/*
+ * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during
+ * fetching the PFN (e.g. a signal might have arrived), so we may want to
+ * retry at some later point and kick the userspace to handle the signal.
+ */
+static inline bool is_intr_pfn(kvm_pfn_t pfn)
+{
+	return pfn == KVM_PFN_ERR_INTR;
+}
+
 /*
  * error_noslot pfns indicate that the gfn can not be
  * translated to pfn - it is not in slot or failed to
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-17  1:41 [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
                   ` (2 preceding siblings ...)
  2022-06-17  1:41 ` [PATCH RFC 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
@ 2022-06-17  1:41 ` Peter Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2022-06-17  1:41 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton, peterx

All the facilities should be ready for this, what we need to do is to add a
new KVM_GTP_INTERRUPTIBLE flag showing that we're willing to be interrupted
by common signals during the __gfn_to_pfn_memslot() request, and wire it up
with a FOLL_INTERRUPTIBLE flag that we've just introduced.

Note that only x86 slow page fault routine will set this new bit.  The new
bit is not used in non-x86 arch or on other gup paths even for x86.
However it can actually be used elsewhere too but not yet covered.

When we see the PFN fetching was interrupted, do early exit to userspace
with an KVM_EXIT_INTR exit reason.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 9 +++++++++
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c      | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e92f1ab63d6a..b39acb7cb16d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
 static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			       unsigned int access)
 {
+	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
+	if (unlikely(is_intr_pfn(fault->pfn))) {
+		vcpu->run->exit_reason = KVM_EXIT_INTR;
+		++vcpu->stat.signal_exits;
+		return -EINTR;
+	}
+
 	/* The pfn is invalid, report the error! */
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
@@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/* Allow to respond to generic signals in slow page faults */
+	flags |= KVM_GTP_INTERRUPTIBLE;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
 					  &fault->map_writable, &fault->hva);
 	return RET_PF_CONTINUE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4f84a442f67f..c8d98e435537 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1163,6 +1163,7 @@ typedef unsigned int __bitwise kvm_gtp_flag_t;
 
 #define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
 #define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
+#define  KVM_GTP_INTERRUPTIBLE  ((__force kvm_gtp_flag_t) BIT(2))
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       kvm_gtp_flag_t gtp_flags, bool *async,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 952400b42ee9..b3873cac5672 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2462,6 +2462,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async,
 		flags |= FOLL_WRITE;
 	if (async)
 		flags |= FOLL_NOWAIT;
+	if (gtp_flags & KVM_GTP_INTERRUPTIBLE)
+		flags |= FOLL_INTERRUPTIBLE;
 
 	npages = get_user_pages_unlocked(addr, 1, &page, flags);
 	if (npages != 1)
@@ -2599,6 +2601,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
 	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
 	if (npages == 1)
 		return pfn;
+	if (npages == -EINTR)
+		return KVM_PFN_ERR_INTR;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-17  1:41 ` [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
@ 2022-06-17 21:53   ` kernel test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2022-06-17 21:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: llvm, kbuild-all

Hi Peter,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on powerpc/topic/ppc-kvm]
[also build test ERROR on mst-vhost/linux-next linus/master v5.19-rc2]
[cannot apply to kvm/queue next-20220617]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/kvm-mm-Allow-GUP-to-respond-to-non-fatal-signals/20220617-094403
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm
config: arm64-randconfig-r001-20220617 (https://download.01.org/0day-ci/archive/20220618/202206180532.C71KuyHh-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d764aa7fc6b9cc3fbe960019018f5f9e941eb0a6)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/6230b0019f9d1e0090102d9bb15c0029edf13c58
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Peter-Xu/kvm-mm-Allow-GUP-to-respond-to-non-fatal-signals/20220617-094403
        git checkout 6230b0019f9d1e0090102d9bb15c0029edf13c58
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/arm64/kvm/mmu.c:1209:32: error: too many arguments to function call, expected 6, have 7
                                      false, NULL, &writable, NULL);
                                                              ^~~~
   include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
   #define NULL ((void *)0)
                ^~~~~~~~~~~
   include/linux/kvm_host.h:1156:11: note: '__gfn_to_pfn_memslot' declared here
   kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
             ^
   1 error generated.


vim +1209 arch/arm64/kvm/mmu.c

  1086	
  1087	static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  1088				  struct kvm_memory_slot *memslot, unsigned long hva,
  1089				  unsigned long fault_status)
  1090	{
  1091		int ret = 0;
  1092		bool write_fault, writable, force_pte = false;
  1093		bool exec_fault;
  1094		bool device = false;
  1095		bool shared;
  1096		unsigned long mmu_seq;
  1097		struct kvm *kvm = vcpu->kvm;
  1098		struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
  1099		struct vm_area_struct *vma;
  1100		short vma_shift;
  1101		gfn_t gfn;
  1102		kvm_pfn_t pfn;
  1103		bool logging_active = memslot_is_logging(memslot);
  1104		bool use_read_lock = false;
  1105		unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
  1106		unsigned long vma_pagesize, fault_granule;
  1107		enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
  1108		struct kvm_pgtable *pgt;
  1109	
  1110		fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
  1111		write_fault = kvm_is_write_fault(vcpu);
  1112		exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
  1113		VM_BUG_ON(write_fault && exec_fault);
  1114	
  1115		if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
  1116			kvm_err("Unexpected L2 read permission error\n");
  1117			return -EFAULT;
  1118		}
  1119	
  1120		/*
  1121		 * Let's check if we will get back a huge page backed by hugetlbfs, or
  1122		 * get block mapping for device MMIO region.
  1123		 */
  1124		mmap_read_lock(current->mm);
  1125		vma = vma_lookup(current->mm, hva);
  1126		if (unlikely(!vma)) {
  1127			kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
  1128			mmap_read_unlock(current->mm);
  1129			return -EFAULT;
  1130		}
  1131	
  1132		/*
  1133		 * logging_active is guaranteed to never be true for VM_PFNMAP
  1134		 * memslots.
  1135		 */
  1136		if (logging_active) {
  1137			force_pte = true;
  1138			vma_shift = PAGE_SHIFT;
  1139			use_read_lock = (fault_status == FSC_PERM && write_fault &&
  1140					 fault_granule == PAGE_SIZE);
  1141		} else {
  1142			vma_shift = get_vma_page_shift(vma, hva);
  1143		}
  1144	
  1145		shared = (vma->vm_flags & VM_SHARED);
  1146	
  1147		switch (vma_shift) {
  1148	#ifndef __PAGETABLE_PMD_FOLDED
  1149		case PUD_SHIFT:
  1150			if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
  1151				break;
  1152			fallthrough;
  1153	#endif
  1154		case CONT_PMD_SHIFT:
  1155			vma_shift = PMD_SHIFT;
  1156			fallthrough;
  1157		case PMD_SHIFT:
  1158			if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
  1159				break;
  1160			fallthrough;
  1161		case CONT_PTE_SHIFT:
  1162			vma_shift = PAGE_SHIFT;
  1163			force_pte = true;
  1164			fallthrough;
  1165		case PAGE_SHIFT:
  1166			break;
  1167		default:
  1168			WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
  1169		}
  1170	
  1171		vma_pagesize = 1UL << vma_shift;
  1172		if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
  1173			fault_ipa &= ~(vma_pagesize - 1);
  1174	
  1175		gfn = fault_ipa >> PAGE_SHIFT;
  1176		mmap_read_unlock(current->mm);
  1177	
  1178		/*
  1179		 * Permission faults just need to update the existing leaf entry,
  1180		 * and so normally don't require allocations from the memcache. The
  1181		 * only exception to this is when dirty logging is enabled at runtime
  1182		 * and a write fault needs to collapse a block entry into a table.
  1183		 */
  1184		if (fault_status != FSC_PERM || (logging_active && write_fault)) {
  1185			ret = kvm_mmu_topup_memory_cache(memcache,
  1186							 kvm_mmu_cache_min_pages(kvm));
  1187			if (ret)
  1188				return ret;
  1189		}
  1190	
  1191		mmu_seq = vcpu->kvm->mmu_notifier_seq;
  1192		/*
  1193		 * Ensure the read of mmu_notifier_seq happens before we call
  1194		 * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
  1195		 * the page we just got a reference to gets unmapped before we have a
  1196		 * chance to grab the mmu_lock, which ensure that if the page gets
  1197		 * unmapped afterwards, the call to kvm_unmap_gfn will take it away
  1198		 * from us again properly. This smp_rmb() interacts with the smp_wmb()
  1199		 * in kvm_mmu_notifier_invalidate_<page|range_end>.
  1200		 *
  1201		 * Besides, __gfn_to_pfn_memslot() instead of gfn_to_pfn_prot() is
  1202		 * used to avoid unnecessary overhead introduced to locate the memory
  1203		 * slot because it's always fixed even @gfn is adjusted for huge pages.
  1204		 */
  1205		smp_rmb();
  1206	
  1207		pfn = __gfn_to_pfn_memslot(memslot, gfn,
  1208					   write_fault ? KVM_GTP_WRITE : 0,
> 1209					   false, NULL, &writable, NULL);
  1210		if (pfn == KVM_PFN_ERR_HWPOISON) {
  1211			kvm_send_hwpoison_signal(hva, vma_shift);
  1212			return 0;
  1213		}
  1214		if (is_error_noslot_pfn(pfn))
  1215			return -EFAULT;
  1216	
  1217		if (kvm_is_device_pfn(pfn)) {
  1218			/*
  1219			 * If the page was identified as device early by looking at
  1220			 * the VMA flags, vma_pagesize is already representing the
  1221			 * largest quantity we can map.  If instead it was mapped
  1222			 * via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
  1223			 * and must not be upgraded.
  1224			 *
  1225			 * In both cases, we don't let transparent_hugepage_adjust()
  1226			 * change things at the last minute.
  1227			 */
  1228			device = true;
  1229		} else if (logging_active && !write_fault) {
  1230			/*
  1231			 * Only actually map the page as writable if this was a write
  1232			 * fault.
  1233			 */
  1234			writable = false;
  1235		}
  1236	
  1237		if (exec_fault && device)
  1238			return -ENOEXEC;
  1239	
  1240		/*
  1241		 * To reduce MMU contentions and enhance concurrency during dirty
  1242		 * logging dirty logging, only acquire read lock for permission
  1243		 * relaxation.
  1244		 */
  1245		if (use_read_lock)
  1246			read_lock(&kvm->mmu_lock);
  1247		else
  1248			write_lock(&kvm->mmu_lock);
  1249		pgt = vcpu->arch.hw_mmu->pgt;
  1250		if (mmu_notifier_retry(kvm, mmu_seq))
  1251			goto out_unlock;
  1252	
  1253		/*
  1254		 * If we are not forced to use page mapping, check if we are
  1255		 * backed by a THP and thus use block mapping if possible.
  1256		 */
  1257		if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
  1258			if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
  1259				vma_pagesize = fault_granule;
  1260			else
  1261				vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
  1262									   hva, &pfn,
  1263									   &fault_ipa);
  1264		}
  1265	
  1266		if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
  1267			/* Check the VMM hasn't introduced a new VM_SHARED VMA */
  1268			if (!shared)
  1269				ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
  1270			else
  1271				ret = -EFAULT;
  1272			if (ret)
  1273				goto out_unlock;
  1274		}
  1275	
  1276		if (writable)
  1277			prot |= KVM_PGTABLE_PROT_W;
  1278	
  1279		if (exec_fault)
  1280			prot |= KVM_PGTABLE_PROT_X;
  1281	
  1282		if (device)
  1283			prot |= KVM_PGTABLE_PROT_DEVICE;
  1284		else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
  1285			prot |= KVM_PGTABLE_PROT_X;
  1286	
  1287		/*
  1288		 * Under the premise of getting a FSC_PERM fault, we just need to relax
  1289		 * permissions only if vma_pagesize equals fault_granule. Otherwise,
  1290		 * kvm_pgtable_stage2_map() should be called to change block size.
  1291		 */
  1292		if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
  1293			ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
  1294		} else {
  1295			WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
  1296	
  1297			ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
  1298						     __pfn_to_phys(pfn), prot,
  1299						     memcache);
  1300		}
  1301	
  1302		/* Mark the page dirty only if the fault is handled successfully */
  1303		if (writable && !ret) {
  1304			kvm_set_pfn_dirty(pfn);
  1305			mark_page_dirty_in_slot(kvm, memslot, gfn);
  1306		}
  1307	
  1308	out_unlock:
  1309		if (use_read_lock)
  1310			read_unlock(&kvm->mmu_lock);
  1311		else
  1312			write_unlock(&kvm->mmu_lock);
  1313		kvm_set_pfn_accessed(pfn);
  1314		kvm_release_pfn_clean(pfn);
  1315		return ret != -EAGAIN ? ret : 0;
  1316	}
  1317	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-17  1:41 ` [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
@ 2022-06-21  8:23   ` David Hildenbrand
  2022-06-21 17:09     ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2022-06-21  8:23 UTC (permalink / raw)
  To: Peter Xu, kvm, linux-kernel
  Cc: Dr . David Alan Gilbert, Linux MM Mailing List,
	Sean Christopherson, Paolo Bonzini, Andrea Arcangeli,
	Andrew Morton

On 17.06.22 03:41, Peter Xu wrote:
> We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> issue with it is that not all GUP paths are able to handle signal delivers
> besides SIGKILL.
> 
> That's not ideal for the GUP users who are actually able to handle these
> cases, like KVM.
> 
> KVM uses GUP extensively on faulting guest pages, during which we've got
> existing infrastructures to retry a page fault at a later time.  Allowing
> the GUP to be interrupted by generic signals can make KVM related threads
> to be more responsive.  For examples:
> 
>   (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
>       e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
>       generated to kick the vcpus out of kernel context immediately,
> 
>   (2) SIGINT: which can be used with interactive hypervisor users to stop a
>       virtual machine with Ctrl-C without any delays/hangs,
> 
>   (3) SIGTRAP: which grants GDB capability even during page faults that are
>       stuck for a long time.
> 
> Normally hypervisor will be able to receive these signals properly, but not
> if we're stuck in a GUP for a long time for whatever reason.  It happens
> easily with a stucked postcopy migration when e.g. a network temp failure
> happens, then some vcpu threads can hang death waiting for the pages.  With
> the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> enable the ability to trap these signals.

This makes sense to me. I assume relevant callers will detect "GUP
failed" but also "well, there is a signal to handle" and cleanly back
off, correct?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-21  8:23   ` David Hildenbrand
@ 2022-06-21 17:09     ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2022-06-21 17:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, linux-kernel, Dr . David Alan Gilbert,
	Linux MM Mailing List, Sean Christopherson, Paolo Bonzini,
	Andrea Arcangeli, Andrew Morton

On Tue, Jun 21, 2022 at 10:23:32AM +0200, David Hildenbrand wrote:
> On 17.06.22 03:41, Peter Xu wrote:
> > We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> > issue with it is that not all GUP paths are able to handle signal delivers
> > besides SIGKILL.
> > 
> > That's not ideal for the GUP users who are actually able to handle these
> > cases, like KVM.
> > 
> > KVM uses GUP extensively on faulting guest pages, during which we've got
> > existing infrastructures to retry a page fault at a later time.  Allowing
> > the GUP to be interrupted by generic signals can make KVM related threads
> > to be more responsive.  For examples:
> > 
> >   (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
> >       e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
> >       generated to kick the vcpus out of kernel context immediately,
> > 
> >   (2) SIGINT: which can be used with interactive hypervisor users to stop a
> >       virtual machine with Ctrl-C without any delays/hangs,
> > 
> >   (3) SIGTRAP: which grants GDB capability even during page faults that are
> >       stuck for a long time.
> > 
> > Normally hypervisor will be able to receive these signals properly, but not
> > if we're stuck in a GUP for a long time for whatever reason.  It happens
> > easily with a stucked postcopy migration when e.g. a network temp failure
> > happens, then some vcpu threads can hang death waiting for the pages.  With
> > the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> > enable the ability to trap these signals.
> 
> This makes sense to me. I assume relevant callers will detect "GUP
> failed" but also "well, there is a signal to handle" and cleanly back
> off, correct?

Correct, via an -EINTR.

One thing to mention is that the gup user behavior will be the same as
before if the caller didn't explicilty pass in FOLL_INTERRUPTIBLE with the
gup call.  So after the whole series applied only kvm (and only some path
of kvm, not all GUP; I only touched up the x86 slow page fault path) to
handle this, but that'll be far enough to cover 99.99% use cases that I
wanted to take care of.

E.g., some kvm request to gup on some guest apic page may not still be able
to respond to a SIGUSR1 but that's very very rare, and we can always add
more users of FOLL_INTERRUPTIBLE when the code is ready to benefit from the
fast respondings.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-21 17:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17  1:41 [PATCH RFC 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
2022-06-17  1:41 ` [PATCH RFC 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
2022-06-21  8:23   ` David Hildenbrand
2022-06-21 17:09     ` Peter Xu
2022-06-17  1:41 ` [PATCH RFC 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
2022-06-17 21:53   ` kernel test robot
2022-06-17  1:41 ` [PATCH RFC 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
2022-06-17  1:41 ` [PATCH RFC 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.