[PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals
@ 2022-06-22 21:36 Peter Xu
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

rfc->v1:
- Fix non-x86 build reported by syzbot
- Removing RFC tag

One issue was reported that libvirt won't be able to stop the virtual
machine using QMP command "stop" during a paused postcopy migration [1].

It won't work because "stop the VM" operation requires the hypervisor to
kick all the vcpu threads out using SIG_IPI in QEMU (which is translated to
a SIGUSR1).  However since during a paused postcopy, the vcpu threads are
hang death at handle_userfault() so there're simply not responding to the
kicks.  Further, the "stop" command will further hang the QMP channel.

The mm has facility to process generic signal (FAULT_FLAG_INTERRUPTIBLE),
however it's only used in the PF handlers only, not in GUP. Unluckily, KVM
is a heavy GUP user on guest page faults.  It means we won't be able to
interrupt a long page fault for KVM fetching guest pages with what we have
right now.

I think it's reasonable for GUP to only listen to fatal signals, as most of
the GUP users are not really ready to handle such case.  But actually KVM
is not such an user, and KVM actually has rich infrastructure to handle
even generic signals, and properly deliver the signal to the userspace.
Then the page fault can be retried in the next KVM_RUN.

This patchset added FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE,
and let KVM be the first one to use it.

One thing to mention is that this is not allowing all KVM paths to be able
to respond to non fatal signals, but only on x86 slow page faults.  In the
future when more code is ready for handling signal interruptions, we can
explore possibility to have more gup callers using FOLL_INTERRUPTIBLE.

Tests
=====

I created a postcopy environment, pause the migration by shutting down the
network to emulate a network failure (so the handle_userfault() will stuck
for a long time), then I tried three things:

  (1) Sending QMP command "stop" to QEMU monitor,
  (2) Hitting Ctrl-C from QEMU cmdline,
  (3) GDB attach to the dest QEMU process.

Before this patchset, all three use case hang.  After the patchset, all
work just like when there's not network failure at all.

Please have a look, thanks.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1052

Peter Xu (4):
  mm/gup: Add FOLL_INTERRUPTIBLE
  kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  kvm: Add new pfn error KVM_PFN_ERR_INTR
  kvm/x86: Allow to respond to generic signals during slow page faults

 arch/arm64/kvm/mmu.c                   |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
 arch/x86/kvm/mmu/mmu.c                 | 19 ++++++++----
 include/linux/kvm_host.h               | 21 ++++++++++++-
 include/linux/mm.h                     |  1 +
 mm/gup.c                               | 33 ++++++++++++++++++---
 virt/kvm/kvm_main.c                    | 41 ++++++++++++++++----------
 virt/kvm/kvm_mm.h                      |  6 ++--
 virt/kvm/pfncache.c                    |  2 +-
 10 files changed, 104 insertions(+), 34 deletions(-)

-- 
2.32.0

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
@ 2022-06-22 21:36 ` Peter Xu
  2022-06-25  0:35   ` Jason Gunthorpe
                     ` (2 more replies)
  2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
issue with it is that not all GUP paths are able to handle signal delivers
besides SIGKILL.

That's not ideal for the GUP users who are actually able to handle these
cases, like KVM.

KVM uses GUP extensively on faulting guest pages, during which we've got
existing infrastructures to retry a page fault at a later time.  Allowing
the GUP to be interrupted by generic signals can make KVM related threads
to be more responsive.  For examples:

  (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
      e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
      generated to kick the vcpus out of kernel context immediately,

  (2) SIGINT: which can be used with interactive hypervisor users to stop a
      virtual machine with Ctrl-C without any delays/hangs,

  (3) SIGTRAP: which grants GDB capability even during page faults that are
      stuck for a long time.

Normally hypervisor will be able to receive these signals properly, but not
if we're stuck in a GUP for a long time for whatever reason.  It happens
easily with a stucked postcopy migration when e.g. a network temp failure
happens, then some vcpu threads can hang death waiting for the pages.  With
the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
enable the ability to trap these signals.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  1 +
 mm/gup.c           | 33 +++++++++++++++++++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..ebdf8a6b86c1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
 #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
 #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
 #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
+#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..ad74b137d363 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
 		fault_flags |= FAULT_FLAG_WRITE;
 	if (*flags & FOLL_REMOTE)
 		fault_flags |= FAULT_FLAG_REMOTE;
-	if (locked)
+	if (locked) {
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+		/*
+		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
+		 * (at least) killable.  It also mostly means we're not
+		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
+		 * it won't make a lot of sense to be used alone.
+		 */
+		if (*flags & FOLL_INTERRUPTIBLE)
+			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
+	}
 	if (*flags & FOLL_NOWAIT)
 		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
 	if (*flags & FOLL_TRIED) {
@@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
 }
 EXPORT_SYMBOL_GPL(fixup_user_fault);
 
+/*
+ * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
+ * specified, it'll also respond to generic signals.  The caller of GUP
+ * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
+ */
+static bool gup_signal_pending(unsigned int flags)
+{
+	if (fatal_signal_pending(current))
+		return true;
+
+	if (!(flags & FOLL_INTERRUPTIBLE))
+		return false;
+
+	return signal_pending(current);
+}
+
 /*
  * Please note that this function, unlike __get_user_pages will not
  * return 0 for nr_pages > 0 without FOLL_NOWAIT
@@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
 		 * Repeat on the address that fired VM_FAULT_RETRY
 		 * with both FAULT_FLAG_ALLOW_RETRY and
 		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
-		 * by fatal signals, so we need to check it before we
+		 * by fatal signals of even common signals, depending on
+		 * the caller's request. So we need to check it before we
 		 * start trying again otherwise it can loop forever.
 		 */
-
-		if (fatal_signal_pending(current)) {
+		if (gup_signal_pending(flags)) {
 			if (!pages_done)
 				pages_done = -EINTR;
 			break;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
@ 2022-06-25  0:35   ` Jason Gunthorpe
  2022-06-25  1:23     ` Peter Xu
  2022-06-28  2:07   ` John Hubbard
  2022-07-04 22:48   ` Matthew Wilcox
  2 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2022-06-25  0:35 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> issue with it is that not all GUP paths are able to handle signal delivers
> besides SIGKILL.
> 
> That's not ideal for the GUP users who are actually able to handle these
> cases, like KVM.
> 
> KVM uses GUP extensively on faulting guest pages, during which we've got
> existing infrastructures to retry a page fault at a later time.  Allowing
> the GUP to be interrupted by generic signals can make KVM related threads
> to be more responsive.  For examples:
> 
>   (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
>       e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
>       generated to kick the vcpus out of kernel context immediately,
> 
>   (2) SIGINT: which can be used with interactive hypervisor users to stop a
>       virtual machine with Ctrl-C without any delays/hangs,
> 
>   (3) SIGTRAP: which grants GDB capability even during page faults that are
>       stuck for a long time.
> 
> Normally hypervisor will be able to receive these signals properly, but not
> if we're stuck in a GUP for a long time for whatever reason.  It happens
> easily with a stucked postcopy migration when e.g. a network temp failure
> happens, then some vcpu threads can hang death waiting for the pages.  With
> the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> enable the ability to trap these signals.

Can you talk abit about what is required to use this new interface
correctly?

Lots of GUP callers are in simple system call contexts (like ioctl),
can/should they set this flag and if so what else do they need to do?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-25  0:35   ` Jason Gunthorpe
@ 2022-06-25  1:23     ` Peter Xu
  2022-06-25 23:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-25  1:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

Hi, Jason,

On Fri, Jun 24, 2022 at 09:35:54PM -0300, Jason Gunthorpe wrote:
> Can you talk abit about what is required to use this new interface
> correctly?
> 
> Lots of GUP callers are in simple system call contexts (like ioctl),
> can/should they set this flag and if so what else do they need to do?

Thanks for taking a look.

IMHO the major thing required is the caller can handle the case when GUP
returned (1) less page than expected, and (2) -EINTR returns.

For the -EINTR case, IIUC ideally in an ioctl context we should better
deliver it back to user app this -EINTR (while do cleanups gracefully),
rather than returning anything else (e.g. converting it to -EFAULT or
something else).

But note that FAULT_FLAG_INTERRUPTIBLE is only used in an userfaultfd
context (aka, userfaultfd_get_blocking_state()).  For example, if we hang
at lock_page() (if not go into whether hanging at lock_page makes sense or
not at all.. it really sounds like a bug) and we receive a non-fatal
signal, we won't be able to be scheduled for that since lock_page() uses
TASK_UNINTERRUPTIBLE always.

I think it's a separate problem on whether we should extend the usage of
FAULT_FLAG_INTERRUPTIBLE to things like lock_page() (and probably not..),
and currently it does solve a major issue regarding postcopy hanging on
pages for hypervisor use case.  Hopefully that still justifies this plumber
work to enable the interruptible cap to GUP layer.

If to go back to the original question with a shorter answer: if the ioctl
context that GUP upon a page that will never be with a uffd context, then
it's probably not gonna help at all.. at least not before we use
FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-25  1:23     ` Peter Xu
@ 2022-06-25 23:59       ` Jason Gunthorpe
  2022-06-27 15:29         ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2022-06-25 23:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote:
> If to go back to the original question with a shorter answer: if the ioctl
> context that GUP upon a page that will never be with a uffd context, then
> it's probably not gonna help at all.. at least not before we use
> FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.

I think I would be more interested in this if it could abort a swap
in, for instance. Doesn't this happen if it flows the interruptible
flag into the VMA's fault handler?

Jason

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-25 23:59       ` Jason Gunthorpe
@ 2022-06-27 15:29         ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-27 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Sat, Jun 25, 2022 at 08:59:04PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote:
> > If to go back to the original question with a shorter answer: if the ioctl
> > context that GUP upon a page that will never be with a uffd context, then
> > it's probably not gonna help at all.. at least not before we use
> > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling.
> 
> I think I would be more interested in this if it could abort a swap
> in, for instance. Doesn't this happen if it flows the interruptible
> flag into the VMA's fault handler?

The idea makes sense, but it doesn't work like that right now, afaict. We
need to teach lock_page_or_retry() to be able to consume the flag as
discussed.

I don't see a major blocker for it if lock_page_or_retry() is the only one
we'd like to touch.  Say, the only caller currently is do_swap_page()
(there's also remove_device_exclusive_entry() but just deeper in the
stack). Looks doable so far but I'll need to think about it..  I can keep
you updated if I get something, but it'll be separate from this patchset.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
  2022-06-25  0:35   ` Jason Gunthorpe
@ 2022-06-28  2:07   ` John Hubbard
  2022-06-28 19:31     ` Peter Xu
  2022-07-04 22:48   ` Matthew Wilcox
  2 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-28  2:07 UTC (permalink / raw)
  To: Peter Xu, kvm, linux-kernel
  Cc: Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

On 6/22/22 14:36, Peter Xu wrote:
> We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
> issue with it is that not all GUP paths are able to handle signal delivers
> besides SIGKILL.
> 
> That's not ideal for the GUP users who are actually able to handle these
> cases, like KVM.
> 
> KVM uses GUP extensively on faulting guest pages, during which we've got
> existing infrastructures to retry a page fault at a later time.  Allowing
> the GUP to be interrupted by generic signals can make KVM related threads
> to be more responsive.  For examples:
> 
>    (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
>        e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
>        generated to kick the vcpus out of kernel context immediately,
> 
>    (2) SIGINT: which can be used with interactive hypervisor users to stop a
>        virtual machine with Ctrl-C without any delays/hangs,
> 
>    (3) SIGTRAP: which grants GDB capability even during page faults that are
>        stuck for a long time.
> 
> Normally hypervisor will be able to receive these signals properly, but not
> if we're stuck in a GUP for a long time for whatever reason.  It happens
> easily with a stucked postcopy migration when e.g. a network temp failure
> happens, then some vcpu threads can hang death waiting for the pages.  With
> the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
> enable the ability to trap these signals.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   include/linux/mm.h |  1 +
>   mm/gup.c           | 33 +++++++++++++++++++++++++++++----
>   2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..ebdf8a6b86c1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>   #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
>   #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
>   #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
> +#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */

Perhaps, s/generic/non-fatal/ ?
>   
>   /*
>    * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
> diff --git a/mm/gup.c b/mm/gup.c
> index 551264407624..ad74b137d363 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
>   		fault_flags |= FAULT_FLAG_WRITE;
>   	if (*flags & FOLL_REMOTE)
>   		fault_flags |= FAULT_FLAG_REMOTE;
> -	if (locked)
> +	if (locked) {
>   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> +		/*
> +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> +		 * (at least) killable.  It also mostly means we're not
> +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> +		 * it won't make a lot of sense to be used alone.
> +		 */

This comment seems a little confusing due to its location. We've just
checked "locked", but the comment is talking about other constraints.

Not sure what to suggest. Maybe move it somewhere else?

> +		if (*flags & FOLL_INTERRUPTIBLE)
> +			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
> +	}
>   	if (*flags & FOLL_NOWAIT)
>   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
>   	if (*flags & FOLL_TRIED) {
> @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
>   }
>   EXPORT_SYMBOL_GPL(fixup_user_fault);
>   
> +/*
> + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> + * specified, it'll also respond to generic signals.  The caller of GUP
> + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> + */
> +static bool gup_signal_pending(unsigned int flags)
> +{
> +	if (fatal_signal_pending(current))
> +		return true;
> +
> +	if (!(flags & FOLL_INTERRUPTIBLE))
> +		return false;
> +
> +	return signal_pending(current);
> +}
> +

OK.

>   /*
>    * Please note that this function, unlike __get_user_pages will not
>    * return 0 for nr_pages > 0 without FOLL_NOWAIT
> @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
>   		 * Repeat on the address that fired VM_FAULT_RETRY
>   		 * with both FAULT_FLAG_ALLOW_RETRY and
>   		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
> -		 * by fatal signals, so we need to check it before we
> +		 * by fatal signals of even common signals, depending on
> +		 * the caller's request. So we need to check it before we
>   		 * start trying again otherwise it can loop forever.
>   		 */
> -
> -		if (fatal_signal_pending(current)) {
> +		if (gup_signal_pending(flags)) {

This is new and bold. :) Signals that an application was prepared to
handle can now cause gup to quit early. I wonder if that will break any
use cases out there (SIGPIPE...) ?

Generally, gup callers handle failures pretty well, so it's probably
not too bad. But I wanted to mention the idea that handled interrupts
might be a little surprising here.

thanks,
-- 
John Hubbard
NVIDIA

>   			if (!pages_done)
>   				pages_done = -EINTR;
>   			break;



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-28  2:07   ` John Hubbard
@ 2022-06-28 19:31     ` Peter Xu
  2022-06-28 21:40       ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-28 19:31 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

Hi, John,

Thanks for your comments!

On Mon, Jun 27, 2022 at 07:07:28PM -0700, John Hubbard wrote:

[...]

> > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
> >   #define FOLL_SPLIT_PMD	0x20000	/* split huge pmd before returning */
> >   #define FOLL_PIN	0x40000	/* pages must be released via unpin_user_page */
> >   #define FOLL_FAST_ONLY	0x80000	/* gup_fast: prevent fall-back to slow gup */
> > +#define FOLL_INTERRUPTIBLE  0x100000 /* allow interrupts from generic signals */
> 
> Perhaps, s/generic/non-fatal/ ?

Sure.

> > diff --git a/mm/gup.c b/mm/gup.c
> > index 551264407624..ad74b137d363 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
> >   		fault_flags |= FAULT_FLAG_WRITE;
> >   	if (*flags & FOLL_REMOTE)
> >   		fault_flags |= FAULT_FLAG_REMOTE;
> > -	if (locked)
> > +	if (locked) {
> >   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> > +		/*
> > +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> > +		 * (at least) killable.  It also mostly means we're not
> > +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> > +		 * it won't make a lot of sense to be used alone.
> > +		 */
> 
> This comment seems a little confusing due to its location. We've just
> checked "locked", but the comment is talking about other constraints.
> 
> Not sure what to suggest. Maybe move it somewhere else?

I put it here to be after FAULT_FLAG_KILLABLE we just set.

Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
key we grant "killable" attribute to this GUP.  So I thought it'll be good
to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
being set.

> 
> > +		if (*flags & FOLL_INTERRUPTIBLE)
> > +			fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
> > +	}
> >   	if (*flags & FOLL_NOWAIT)
> >   		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
> >   	if (*flags & FOLL_TRIED) {
> > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm,
> >   }
> >   EXPORT_SYMBOL_GPL(fixup_user_fault);
> > +/*
> > + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> > + * specified, it'll also respond to generic signals.  The caller of GUP
> > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> > + */
> > +static bool gup_signal_pending(unsigned int flags)
> > +{
> > +	if (fatal_signal_pending(current))
> > +		return true;
> > +
> > +	if (!(flags & FOLL_INTERRUPTIBLE))
> > +		return false;
> > +
> > +	return signal_pending(current);
> > +}
> > +
> 
> OK.
> 
> >   /*
> >    * Please note that this function, unlike __get_user_pages will not
> >    * return 0 for nr_pages > 0 without FOLL_NOWAIT
> > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
> >   		 * Repeat on the address that fired VM_FAULT_RETRY
> >   		 * with both FAULT_FLAG_ALLOW_RETRY and
> >   		 * FAULT_FLAG_TRIED.  Note that GUP can be interrupted
> > -		 * by fatal signals, so we need to check it before we
> > +		 * by fatal signals of even common signals, depending on
> > +		 * the caller's request. So we need to check it before we
> >   		 * start trying again otherwise it can loop forever.
> >   		 */
> > -
> > -		if (fatal_signal_pending(current)) {
> > +		if (gup_signal_pending(flags)) {
> 
> This is new and bold. :) Signals that an application was prepared to
> handle can now cause gup to quit early. I wonder if that will break any
> use cases out there (SIGPIPE...) ?

Note: I introduced the new FOLL_INTERRUPTIBLE flag, so only if the caller
explicitly passing in that flag could there be a functional change.

IOW, no functional change intended for this single patch, not before I
start to let KVM code passing over that flag.

> 
> Generally, gup callers handle failures pretty well, so it's probably
> not too bad. But I wanted to mention the idea that handled interrupts
> might be a little surprising here.

Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't
need to worry at all, IMHO, because it should really work exactly like
before, otherwise I had a bug somewhere else.. :)

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-28 19:31     ` Peter Xu
@ 2022-06-28 21:40       ` John Hubbard
  2022-06-28 22:33         ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-28 21:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/28/22 12:31, Peter Xu wrote:
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 551264407624..ad74b137d363 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
>>>    		fault_flags |= FAULT_FLAG_WRITE;
>>>    	if (*flags & FOLL_REMOTE)
>>>    		fault_flags |= FAULT_FLAG_REMOTE;
>>> -	if (locked)
>>> +	if (locked) {
>>>    		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>>> +		/*
>>> +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
>>> +		 * (at least) killable.  It also mostly means we're not
>>> +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
>>> +		 * it won't make a lot of sense to be used alone.
>>> +		 */
>>
>> This comment seems a little confusing due to its location. We've just
>> checked "locked", but the comment is talking about other constraints.
>>
>> Not sure what to suggest. Maybe move it somewhere else?
> 
> I put it here to be after FAULT_FLAG_KILLABLE we just set.
> 
> Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
> key we grant "killable" attribute to this GUP.  So I thought it'll be good
> to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
> being set.
> 

The key point is the connection between "locked" and killable. If the comment
explained why "locked" means "killable", that would help clear this up. The
NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
clear it up either... :)


>> Generally, gup callers handle failures pretty well, so it's probably
>> not too bad. But I wanted to mention the idea that handled interrupts
>> might be a little surprising here.
> 
> Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't
> need to worry at all, IMHO, because it should really work exactly like
> before, otherwise I had a bug somewhere else.. :)
> 

Yes, that's true. OK then.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-28 21:40       ` John Hubbard
@ 2022-06-28 22:33         ` Peter Xu
  2022-06-29  0:31           ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-28 22:33 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Tue, Jun 28, 2022 at 02:40:53PM -0700, John Hubbard wrote:
> On 6/28/22 12:31, Peter Xu wrote:
> > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > index 551264407624..ad74b137d363 100644
> > > > --- a/mm/gup.c
> > > > +++ b/mm/gup.c
> > > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma,
> > > >    		fault_flags |= FAULT_FLAG_WRITE;
> > > >    	if (*flags & FOLL_REMOTE)
> > > >    		fault_flags |= FAULT_FLAG_REMOTE;
> > > > -	if (locked)
> > > > +	if (locked) {
> > > >    		fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> > > > +		/*
> > > > +		 * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're
> > > > +		 * (at least) killable.  It also mostly means we're not
> > > > +		 * with NOWAIT.  Otherwise ignore FOLL_INTERRUPTIBLE since
> > > > +		 * it won't make a lot of sense to be used alone.
> > > > +		 */
> > > 
> > > This comment seems a little confusing due to its location. We've just
> > > checked "locked", but the comment is talking about other constraints.
> > > 
> > > Not sure what to suggest. Maybe move it somewhere else?
> > 
> > I put it here to be after FAULT_FLAG_KILLABLE we just set.
> > 
> > Only if we have "locked" will we set FAULT_FLAG_KILLABLE.  That's also the
> > key we grant "killable" attribute to this GUP.  So I thought it'll be good
> > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked"
> > being set.
> > 
> 
> The key point is the connection between "locked" and killable. If the comment
> explained why "locked" means "killable", that would help clear this up. The
> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
> clear it up either... :)

Sorry to have a comment that makes it feels confusing.  I tried to
explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
obviously I didn't do my job well..

Maybe that NOWAIT thing adds more complexity but not even necessary.

Would below one more acceptable?

		/*
		 * We'll only be able to respond to signals when "locked !=
		 * NULL".  When with it, we'll always respond to SIGKILL
		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
		 * respond to non-fatal signals only if the GUP user has
		 * specified FOLL_INTERRUPTIBLE.
		 */

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-28 22:33         ` Peter Xu
@ 2022-06-29  0:31           ` John Hubbard
  2022-06-29 15:47             ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-29  0:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/28/22 15:33, Peter Xu wrote:
>> The key point is the connection between "locked" and killable. If the comment
>> explained why "locked" means "killable", that would help clear this up. The
>> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
>> clear it up either... :)
> 
> Sorry to have a comment that makes it feels confusing.  I tried to
> explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
> obviously I didn't do my job well..
> 
> Maybe that NOWAIT thing adds more complexity but not even necessary.
> 
> Would below one more acceptable?
> 
> 		/*
> 		 * We'll only be able to respond to signals when "locked !=
> 		 * NULL".  When with it, we'll always respond to SIGKILL
> 		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
> 		 * respond to non-fatal signals only if the GUP user has
> 		 * specified FOLL_INTERRUPTIBLE.
> 		 */


It looks like part of this comment is trying to document a pre-existing
concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
if locked != NULL. The problem I am (personally) having is that I don't
yet understand why or how those are connected: what is it about having
locked non-NULL that means the process is killable? (Can you explain why
that is?)

If that were clear, I think I could suggest a good comment wording.




thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-29  0:31           ` John Hubbard
@ 2022-06-29 15:47             ` Peter Xu
  2022-06-30  1:53               ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-29 15:47 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Tue, Jun 28, 2022 at 05:31:43PM -0700, John Hubbard wrote:
> On 6/28/22 15:33, Peter Xu wrote:
> > > The key point is the connection between "locked" and killable. If the comment
> > > explained why "locked" means "killable", that would help clear this up. The
> > > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not
> > > clear it up either... :)
> > 
> > Sorry to have a comment that makes it feels confusing.  I tried to
> > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but
> > obviously I didn't do my job well..
> > 
> > Maybe that NOWAIT thing adds more complexity but not even necessary.
> > 
> > Would below one more acceptable?
> > 
> > 		/*
> > 		 * We'll only be able to respond to signals when "locked !=
> > 		 * NULL".  When with it, we'll always respond to SIGKILL
> > 		 * (as implied by FAULT_FLAG_KILLABLE above), and we'll
> > 		 * respond to non-fatal signals only if the GUP user has
> > 		 * specified FOLL_INTERRUPTIBLE.
> > 		 */
> 
> 
> It looks like part of this comment is trying to document a pre-existing
> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
> if locked != NULL.

I'd say that's not what I wanted to comment.. I wanted to express that
INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
be after KILLABLE, not before.  IMHO it makes sense already to have
"interruptible" only if "killable", no matter what's the pre-requisite for
KILLABLE (in this case it's having "locked" being non-null).

> The problem I am (personally) having is that I don't yet understand why
> or how those are connected: what is it about having locked non-NULL that
> means the process is killable? (Can you explain why that is?)

Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
retry at all it means we'll never wait in handle_mm_fault() anyway, then no
need to worry on being interrupted by any kind of signal (fatal or not).

Then if we allow retry, we need some way to know "whether mmap_sem is
released or not" during the process for the caller (because the caller
cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
that we can set *locked=false to tell the caller we have released mmap_sem.

I think that's why we have "locked" defined as "we allow this page fault
request to retry and wait, during wait we can always allow fatal signals".
I think that's defined throughout the gup call interfaces too, and
faultin_page() is the last step to talk to handle_mm_fault().

To make this whole picture complete, NOWAIT is another thing that relies on
ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
For example, when we want to make sure no vma will be released after
faultin_page() returned.

> 
> If that were clear, I think I could suggest a good comment wording.

IMHO it's a little bit weird to explain "locked" here, especially after
KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
attempt.  There are some comments for "locked" above the definition of
faultin_page(), I think that'll be a nicer place to enrich explanations for
"locked", and it seems even more suitable as a separate patch?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-29 15:47             ` Peter Xu
@ 2022-06-30  1:53               ` John Hubbard
  2022-06-30 13:49                 ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-30  1:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/29/22 08:47, Peter Xu wrote:
>> It looks like part of this comment is trying to document a pre-existing
>> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
>> if locked != NULL.
> 
> I'd say that's not what I wanted to comment.. I wanted to express that
> INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
> be after KILLABLE, not before.  IMHO it makes sense already to have
> "interruptible" only if "killable", no matter what's the pre-requisite for
> KILLABLE (in this case it's having "locked" being non-null).
>

OK, I think I finally understand both the intention of the comment,
and (thanks to your notes, below) the interaction between *locked and
_RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading
me by the nose through that. The pre-existing code is abusing *locked
a bit, by treating it as a flag when really it is a side effect of
flags, but at least now that's clear to me.

Anyway...this leads to finally getting into the comment, which I now
think is not quite what we want: there is no need for a hierarchy of
"_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an
application allows a fatal signal to get through, it's not clear to me
that that implies that non-fatal signal handling should be prevented.

The code is only vaguely enforcing such a thing, because it just so
happens that both cases require the same basic prerequisites. So the
code looks good, but I don't see a need to claim a hierarchy in the
comments.

So I'd either delete the comment entirely, or go with something that is
doesn't try to talk about hierarchy nor locked/retry either. Does this
look reasonable to you:

	/*
	 * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set
	 * FOLL_INTERRUPTIBLE. That's because some callers may not be
	 * prepared to handle early exits caused by non-fatal signals.
	 */

?

>> The problem I am (personally) having is that I don't yet understand why
>> or how those are connected: what is it about having locked non-NULL that
>> means the process is killable? (Can you explain why that is?)
> 
> Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
> retry at all it means we'll never wait in handle_mm_fault() anyway, then no
> need to worry on being interrupted by any kind of signal (fatal or not).
> 
> Then if we allow retry, we need some way to know "whether mmap_sem is
> released or not" during the process for the caller (because the caller
> cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
> that we can set *locked=false to tell the caller we have released mmap_sem.
> 
> I think that's why we have "locked" defined as "we allow this page fault
> request to retry and wait, during wait we can always allow fatal signals".
> I think that's defined throughout the gup call interfaces too, and
> faultin_page() is the last step to talk to handle_mm_fault().
> 
> To make this whole picture complete, NOWAIT is another thing that relies on
> ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
> For example, when we want to make sure no vma will be released after
> faultin_page() returned.
> 

Again, thanks for taking the time to explain that for me. :)

>>
>> If that were clear, I think I could suggest a good comment wording.
> 
> IMHO it's a little bit weird to explain "locked" here, especially after
> KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
> attempt.  There are some comments for "locked" above the definition of
> faultin_page(), I think that'll be a nicer place to enrich explanations for
> "locked", and it seems even more suitable as a separate patch?
> 

Totally agreed. I didn't intend to ask for that kind of documentation
here.

For that, I'm thinking a combination of cleaning up *locked a little
bit, plus maybe some higher level notes like what you wrote above, added
to either pin_user_pages.rst or a new get_user_pages.rst or some .rst
anyway. Definitely a separately thing.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-30  1:53               ` John Hubbard
@ 2022-06-30 13:49                 ` Peter Xu
  2022-06-30 19:01                   ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-30 13:49 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Wed, Jun 29, 2022 at 06:53:30PM -0700, John Hubbard wrote:
> On 6/29/22 08:47, Peter Xu wrote:
> > > It looks like part of this comment is trying to document a pre-existing
> > > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE
> > > if locked != NULL.
> > 
> > I'd say that's not what I wanted to comment.. I wanted to express that
> > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to
> > be after KILLABLE, not before.  IMHO it makes sense already to have
> > "interruptible" only if "killable", no matter what's the pre-requisite for
> > KILLABLE (in this case it's having "locked" being non-null).
> > 
> 
> OK, I think I finally understand both the intention of the comment,
> and (thanks to your notes, below) the interaction between *locked and
> _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading
> me by the nose through that. The pre-existing code is abusing *locked
> a bit, by treating it as a flag when really it is a side effect of
> flags, but at least now that's clear to me.

I agree, alternatively we could have some other FOLL_ flags to represent
"locked != NULL" and do sanity check to make sure when the flag is there
locked is always set correctly.  Current code is a more "dense" way to do
this, even though it could be slightly harder to follow.

> 
> Anyway...this leads to finally getting into the comment, which I now
> think is not quite what we want: there is no need for a hierarchy of
> "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an
> application allows a fatal signal to get through, it's not clear to me
> that that implies that non-fatal signal handling should be prevented.
> 
> The code is only vaguely enforcing such a thing, because it just so
> happens that both cases require the same basic prerequisites. So the
> code looks good, but I don't see a need to claim a hierarchy in the
> comments.
> 
> So I'd either delete the comment entirely, or go with something that is
> doesn't try to talk about hierarchy nor locked/retry either. Does this
> look reasonable to you:
> 
> 
> 	/*
> 	 * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set
> 	 * FOLL_INTERRUPTIBLE. That's because some callers may not be
> 	 * prepared to handle early exits caused by non-fatal signals.
> 	 */
> 
> ?

Looks good to me, I'd tune a bit to make it less ambiguous on a few places:

		/*
		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
		 * That's because some callers may not be prepared to
		 * handle early exits caused by non-fatal signals.
		 */

Would that be okay to you?

> 
> > > The problem I am (personally) having is that I don't yet understand why
> > > or how those are connected: what is it about having locked non-NULL that
> > > means the process is killable? (Can you explain why that is?)
> > 
> > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow
> > retry at all it means we'll never wait in handle_mm_fault() anyway, then no
> > need to worry on being interrupted by any kind of signal (fatal or not).
> > 
> > Then if we allow retry, we need some way to know "whether mmap_sem is
> > released or not" during the process for the caller (because the caller
> > cannot see VM_FAULT_RETRY).  That's why we added "locked" parameter, so
> > that we can set *locked=false to tell the caller we have released mmap_sem.
> > 
> > I think that's why we have "locked" defined as "we allow this page fault
> > request to retry and wait, during wait we can always allow fatal signals".
> > I think that's defined throughout the gup call interfaces too, and
> > faultin_page() is the last step to talk to handle_mm_fault().
> > 
> > To make this whole picture complete, NOWAIT is another thing that relies on
> > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all".
> > For example, when we want to make sure no vma will be released after
> > faultin_page() returned.
> > 
> 
> Again, thanks for taking the time to explain that for me. :)

My thanks for reviewing!

> 
> > > 
> > > If that were clear, I think I could suggest a good comment wording.
> > 
> > IMHO it's a little bit weird to explain "locked" here, especially after
> > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd
> > attempt.  There are some comments for "locked" above the definition of
> > faultin_page(), I think that'll be a nicer place to enrich explanations for
> > "locked", and it seems even more suitable as a separate patch?
> > 
> 
> Totally agreed. I didn't intend to ask for that kind of documentation
> here.
> 
> For that, I'm thinking a combination of cleaning up *locked a little
> bit, plus maybe some higher level notes like what you wrote above, added
> to either pin_user_pages.rst or a new get_user_pages.rst or some .rst
> anyway. Definitely a separately thing.

Sounds good.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-30 13:49                 ` Peter Xu
@ 2022-06-30 19:01                   ` John Hubbard
  2022-06-30 21:27                     ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-30 19:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/30/22 06:49, Peter Xu wrote:
> Looks good to me, I'd tune a bit to make it less ambiguous on a few places:
> 
> 		/*
> 		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
> 		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
> 		 * That's because some callers may not be prepared to
> 		 * handle early exits caused by non-fatal signals.
> 		 */
> 
> Would that be okay to you?
> 

Yes, looks good. With that change, please feel free to add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-30 19:01                   ` John Hubbard
@ 2022-06-30 21:27                     ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-30 21:27 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Thu, Jun 30, 2022 at 12:01:53PM -0700, John Hubbard wrote:
> On 6/30/22 06:49, Peter Xu wrote:
> > Looks good to me, I'd tune a bit to make it less ambiguous on a few places:
> > 
> > 		/*
> > 		 * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set
> > 		 * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE.
> > 		 * That's because some callers may not be prepared to
> > 		 * handle early exits caused by non-fatal signals.
> > 		 */
> > 
> > Would that be okay to you?
> > 
> 
> Yes, looks good. With that change, please feel free to add:
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>

Will do, thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
  2022-06-25  0:35   ` Jason Gunthorpe
  2022-06-28  2:07   ` John Hubbard
@ 2022-07-04 22:48   ` Matthew Wilcox
  2022-07-07 15:06     ` Peter Xu
  2 siblings, 1 reply; 39+ messages in thread
From: Matthew Wilcox @ 2022-07-04 22:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> +/*
> + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> + * specified, it'll also respond to generic signals.  The caller of GUP
> + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> + */
> +static bool gup_signal_pending(unsigned int flags)
> +{
> +	if (fatal_signal_pending(current))
> +		return true;
> +
> +	if (!(flags & FOLL_INTERRUPTIBLE))
> +		return false;
> +
> +	return signal_pending(current);
> +}

This should resemble signal_pending_state() more closely, if indeed not
be a wrapper of signal_pending_state().

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE
  2022-07-04 22:48   ` Matthew Wilcox
@ 2022-07-07 15:06     ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-07-07 15:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Mon, Jul 04, 2022 at 11:48:17PM +0100, Matthew Wilcox wrote:
> On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote:
> > +/*
> > + * GUP always responds to fatal signals.  When FOLL_INTERRUPTIBLE is
> > + * specified, it'll also respond to generic signals.  The caller of GUP
> > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption.
> > + */
> > +static bool gup_signal_pending(unsigned int flags)
> > +{
> > +	if (fatal_signal_pending(current))
> > +		return true;
> > +
> > +	if (!(flags & FOLL_INTERRUPTIBLE))
> > +		return false;
> > +
> > +	return signal_pending(current);
> > +}
> 
> This should resemble signal_pending_state() more closely, if indeed not
> be a wrapper of signal_pending_state().

Could you be more specific?  Note that the only thing that should affect
the signal handling here is FOLL_INTERRUPTIBLE, we don't allow anything
else being passed in, e.g. we don't take TASK_INTERRUPTIBLE or TASK_*.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
@ 2022-06-22 21:36 ` Peter Xu
  2022-06-23 14:49   ` Sean Christopherson
  2022-06-28  2:17   ` John Hubbard
  2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
  2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
  3 siblings, 2 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
__gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
for new boolean to be added to __gfn_to_pfn_memslot().

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/kvm/mmu.c                   |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
 arch/x86/kvm/mmu/mmu.c                 | 10 +++----
 include/linux/kvm_host.h               |  9 ++++++-
 virt/kvm/kvm_main.c                    | 37 +++++++++++++++-----------
 virt/kvm/kvm_mm.h                      |  6 +++--
 virt/kvm/pfncache.c                    |  2 +-
 8 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f5651a05b6a8..ce1edb512b4e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	pfn = __gfn_to_pfn_memslot(memslot, gfn,
+				   write_fault ? KVM_GTP_WRITE : 0,
+				   NULL, &writable, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 514fd45c1994..e2769d58dd87 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, &write_ok, NULL);
+		pfn = __gfn_to_pfn_memslot(memslot, gfn,
+					   writing ? KVM_GTP_WRITE : 0,
+					   NULL, &write_ok, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 42851c32ff3b..232b17c75b83 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;
 
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-					   writing, upgrade_p, NULL);
+		pfn = __gfn_to_pfn_memslot(memslot, gfn,
+					   writing ? KVM_GTP_WRITE : 0,
+					   NULL, upgrade_p, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
 		page = NULL;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4653688fa6d..e92f1ab63d6a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
+	kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0;
 	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
 
@@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	}
 
 	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
-					  fault->write, &fault->map_writable,
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
+					  &async, &fault->map_writable,
 					  &fault->hva);
 	if (!async)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */
@@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
-					  fault->write, &fault->map_writable,
-					  &fault->hva);
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
+					  &fault->map_writable, &fault->hva);
 	return RET_PF_CONTINUE;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c20f2d55840c..b646b6fcaec6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable);
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
+
+/* gfn_to_pfn (gtp) flags */
+typedef unsigned int __bitwise kvm_gtp_flag_t;
+
+#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
+#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
+
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       kvm_gtp_flag_t gtp_flags, bool *async,
 			       bool *writable, hva_t *hva);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 64ec2222a196..952400b42ee9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2444,9 +2444,11 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * The slow path to get the pfn of the specified host virtual address,
  * 1 indicates success, -errno is returned if error is detected.
  */
-static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+static int hva_to_pfn_slow(unsigned long addr, bool *async,
+			   kvm_gtp_flag_t gtp_flags,
 			   bool *writable, kvm_pfn_t *pfn)
 {
+	bool write_fault = gtp_flags & KVM_GTP_WRITE;
 	unsigned int flags = FOLL_HWPOISON;
 	struct page *page;
 	int npages = 0;
@@ -2565,20 +2567,22 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 /*
  * Pin guest page in memory and return its pfn.
  * @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function can sleep
+ * @gtp_flags: kvm_gtp_flag_t flags (atomic, write, ..)
  * @async: whether this function need to wait IO complete if the
  *         host page is not in the memory
- * @write_fault: whether we should get a writable host page
  * @writable: whether it allows to map a writable host page for !@write_fault
  *
- * The function will map a writable host page for these two cases:
+ * The function will map a writable (KVM_GTP_WRITE set) host page for these
+ * two cases:
  * 1): @write_fault = true
  * 2): @write_fault = false && @writable, @writable will tell the caller
  *     whether the mapping is writable.
  */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-		     bool write_fault, bool *writable)
+kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
+		     bool *writable)
 {
+	bool write_fault = gtp_flags & KVM_GTP_WRITE;
+	bool atomic = gtp_flags & KVM_GTP_ATOMIC;
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn = 0;
 	int npages, r;
@@ -2592,7 +2596,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (atomic)
 		return KVM_PFN_ERR_FAULT;
 
-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
 	if (npages == 1)
 		return pfn;
 
@@ -2625,10 +2629,11 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 }
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       kvm_gtp_flag_t gtp_flags, bool *async,
 			       bool *writable, hva_t *hva)
 {
-	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
+	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL,
+					       gtp_flags & KVM_GTP_WRITE);
 
 	if (hva)
 		*hva = addr;
@@ -2651,28 +2656,30 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}
 
-	return hva_to_pfn(addr, atomic, async, write_fault,
-			  writable);
+	return hva_to_pfn(addr, gtp_flags, async, writable);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
-				    write_fault, writable, NULL);
+	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn,
+				    write_fault ? KVM_GTP_WRITE : 0,
+				    NULL, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
 
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE,
+				    NULL, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
 
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE | KVM_GTP_ATOMIC,
+				    NULL, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
 
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 41da467d99c9..1c870911eb48 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -3,6 +3,8 @@
 #ifndef __KVM_MM_H__
 #define __KVM_MM_H__ 1
 
+#include <linux/kvm_host.h>
+
 /*
  * Architectures can choose whether to use an rwlock or spinlock
  * for the mmu_lock.  These macros, for use in common code
@@ -24,8 +26,8 @@
 #define KVM_MMU_READ_UNLOCK(kvm)	spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */
 
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
-		     bool write_fault, bool *writable);
+kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
+		     bool *writable);
 
 #ifdef CONFIG_HAVE_KVM_PFNCACHE
 void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..0f9f6b5d2fbb 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
 		smp_rmb();
 
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(uhva, KVM_GTP_WRITE, NULL, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			break;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
@ 2022-06-23 14:49   ` Sean Christopherson
  2022-06-23 19:46     ` Peter Xu
  2022-06-28  2:17   ` John Hubbard
  1 sibling, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 14:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Wed, Jun 22, 2022, Peter Xu wrote:
> Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
> __gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
> for new boolean to be added to __gfn_to_pfn_memslot().

...

> @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  	}
>  
>  	async = false;
> -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> -					  fault->write, &fault->map_writable,
> +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
> +					  &async, &fault->map_writable,
>  					  &fault->hva);
>  	if (!async)
>  		return RET_PF_CONTINUE; /* *pfn has correct page already */
> @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  		}
>  	}
>  
> -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
> -					  fault->write, &fault->map_writable,
> -					  &fault->hva);
> +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
> +					  &fault->map_writable, &fault->hva);
>  	return RET_PF_CONTINUE;
>  }
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c20f2d55840c..b646b6fcaec6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
>  		      bool *writable);
>  kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
>  kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
> +
> +/* gfn_to_pfn (gtp) flags */
> +typedef unsigned int __bitwise kvm_gtp_flag_t;
> +
> +#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
> +#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
> +
>  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> -			       bool atomic, bool *async, bool write_fault,
> +			       kvm_gtp_flag_t gtp_flags, bool *async,
>  			       bool *writable, hva_t *hva);

I completely agree the list of booleans is a mess, but I don't love the result of
adding @flags.  I wonder if we can do something similar to x86's struct kvm_page_fault
and add an internal struct to pass params.  And then add e.g. gfn_to_pfn_interruptible()
to wrap that logic.

I suspect we could also clean up the @async behavior at the same time, as its
interaction with FOLL_NOWAIT is confusing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-23 14:49   ` Sean Christopherson
@ 2022-06-23 19:46     ` Peter Xu
  2022-06-23 20:29       ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-23 19:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022 at 02:49:47PM +0000, Sean Christopherson wrote:
> On Wed, Jun 22, 2022, Peter Xu wrote:
> > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
> > __gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
> > for new boolean to be added to __gfn_to_pfn_memslot().
> 
> ...
> 
> > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  	}
> >  
> >  	async = false;
> > -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> > -					  fault->write, &fault->map_writable,
> > +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
> > +					  &async, &fault->map_writable,
> >  					  &fault->hva);
> >  	if (!async)
> >  		return RET_PF_CONTINUE; /* *pfn has correct page already */
> > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  		}
> >  	}
> >  
> > -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
> > -					  fault->write, &fault->map_writable,
> > -					  &fault->hva);
> > +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
> > +					  &fault->map_writable, &fault->hva);
> >  	return RET_PF_CONTINUE;
> >  }
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c20f2d55840c..b646b6fcaec6 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
> >  		      bool *writable);
> >  kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
> >  kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
> > +
> > +/* gfn_to_pfn (gtp) flags */
> > +typedef unsigned int __bitwise kvm_gtp_flag_t;
> > +
> > +#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
> > +#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
> > +
> >  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> > -			       bool atomic, bool *async, bool write_fault,
> > +			       kvm_gtp_flag_t gtp_flags, bool *async,
> >  			       bool *writable, hva_t *hva);
> 
> I completely agree the list of booleans is a mess, but I don't love the result of
> adding @flags.  I wonder if we can do something similar to x86's struct kvm_page_fault
> and add an internal struct to pass params.

Yep we can.  It's just that it'll be another goal irrelevant of this series
but it could be a standalone cleanup patchset for gfn->hpa conversion
paths.  Say, the new struct can also be done on top containing the new
flag, IMHO.

This reminded me of an interesting topic that Nadav used to mention that
when Matthew changed some of the Linux function parameters into a structure
then the .obj actually grows a bit due to the strong stack protector that
Linux uses.  If I'll be doing such a change I'd guess I need to dig a bit
into that first, but hopefully I don't need to for this series alone.

Sorry to be off-topic: I think it's a matter of whether you think it's okay
we merge the flags first, even if we want to go with a struct pointer
finally.

> And then add e.g. gfn_to_pfn_interruptible() to wrap that logic.

That helper sounds good, it's just that the major user I'm modifying here
doesn't really use gfn_to_pfn() at all but __gfn_to_pfn_memslot()
underneath.  I'll remember to have that when I plan to convert some
gfn_to_pfn() call sites.

> 
> I suspect we could also clean up the @async behavior at the same time, as its
> interaction with FOLL_NOWAIT is confusing.

Yeah I don't like that either.  Let me think about that when proposing a
new version.  Logically that's separate idea from this series too, but if
you think that'll be nice to have altogether then I can give it a shot.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-23 19:46     ` Peter Xu
@ 2022-06-23 20:29       ` Sean Christopherson
  2022-06-23 21:29         ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 20:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022, Peter Xu wrote:
> On Thu, Jun 23, 2022 at 02:49:47PM +0000, Sean Christopherson wrote:
> > On Wed, Jun 22, 2022, Peter Xu wrote:
> > > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
> > > __gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
> > > for new boolean to be added to __gfn_to_pfn_memslot().

...

> > > +/* gfn_to_pfn (gtp) flags */
> > > +typedef unsigned int __bitwise kvm_gtp_flag_t;
> > > +
> > > +#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
> > > +#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
> > > +
> > >  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> > > -			       bool atomic, bool *async, bool write_fault,
> > > +			       kvm_gtp_flag_t gtp_flags, bool *async,
> > >  			       bool *writable, hva_t *hva);
> > 
> > I completely agree the list of booleans is a mess, but I don't love the result of
> > adding @flags.  I wonder if we can do something similar to x86's struct kvm_page_fault
> > and add an internal struct to pass params.
> 
> Yep we can.  It's just that it'll be another goal irrelevant of this series

But it's not irrelevant.  By introducing KVM_GTP_*, you opened the topic of cleaning
up the parameters.  Don't get me wrong, I like that you proposed cleaning up the mess,
but if we're going to churn then let's get the API right.

> but it could be a standalone cleanup patchset for gfn->hpa conversion
> paths.  Say, the new struct can also be done on top containing the new
> flag, IMHO.

No, because if we go to a struct, then I'd much rather have bools and not flags.

> This reminded me of an interesting topic that Nadav used to mention that
> when Matthew changed some of the Linux function parameters into a structure
> then the .obj actually grows a bit due to the strong stack protector that
> Linux uses.  If I'll be doing such a change I'd guess I need to dig a bit
> into that first, but hopefully I don't need to for this series alone.
> 
> Sorry to be off-topic: I think it's a matter of whether you think it's okay
> we merge the flags first, even if we want to go with a struct pointer
> finally.

Either take a dependency on doing a full cleanup, or just add yet another bool and
leave _all_ cleanup to a separate series.  Resolving conflicts with a new param
is fairly straightforward, whereas resolving divergent cleanups gets painful.

As gross as it is, I think my preference would be to just add another bool in this
series.  Then we can get more aggressive with a cleanup without having to worry
about unnecessarily pushing this series out a release or three.

> > And then add e.g. gfn_to_pfn_interruptible() to wrap that logic.
> 
> That helper sounds good, it's just that the major user I'm modifying here
> doesn't really use gfn_to_pfn() at all but __gfn_to_pfn_memslot()
> underneath.  I'll remember to have that when I plan to convert some
> gfn_to_pfn() call sites.

Ah, right.  That can be remedied more easily if @async goes away.  Then we can
have:

  gfn_to_pfn_memslot_nowait()

and

  gfn_to_pfn_memslot_interruptible()

and those are mutually exclusive, i.e. recognize generic signals if and only if
gup is allowed to wait.  But that can be left to the cleanup series.

> > I suspect we could also clean up the @async behavior at the same time, as its
> > interaction with FOLL_NOWAIT is confusing.
> 
> Yeah I don't like that either.  Let me think about that when proposing a
> new version.  Logically that's separate idea from this series too, but if
> you think that'll be nice to have altogether then I can give it a shot.

This is what I came up with for splitting @async into a pure input (no_wait) and
a return value (KVM_PFN_ERR_NEEDS_IO).

---
 arch/arm64/kvm/mmu.c                   |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  4 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/x86/kvm/mmu/mmu.c                 |  8 ++--
 include/linux/kvm_host.h               |  3 +-
 virt/kvm/kvm_main.c                    | 57 ++++++++++++++------------
 virt/kvm/kvm_mm.h                      |  2 +-
 virt/kvm/pfncache.c                    |  2 +-
 8 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 87f1cd0df36e..a87f01edef8e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1204,7 +1204,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 */
 	smp_rmb();

-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false,
 				   write_fault, &writable, NULL);
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 514fd45c1994..32f4b56ca315 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -590,7 +590,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,

 	/*
 	 * Do a fast check first, since __gfn_to_pfn_memslot doesn't
-	 * do it with !atomic && !async, which is how we call it.
+	 * do it with !atomic && !nowait, which is how we call it.
 	 * We always ask for write permission since the common case
 	 * is that the page is writable.
 	 */
@@ -598,7 +598,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
 		write_ok = true;
 	} else {
 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false,
 					   writing, &write_ok, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 42851c32ff3b..4338affe295e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -845,7 +845,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 		unsigned long pfn;

 		/* Call KVM generic code to do the slow-path check */
-		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
+		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false,
 					   writing, upgrade_p, NULL);
 		if (is_error_noslot_pfn(pfn))
 			return -EFAULT;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 79c6a821ea0d..35b364589fa4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4102,7 +4102,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_memory_slot *slot = fault->slot;
-	bool async;

 	/*
 	 * Retry the page fault if the gfn hit a memslot that is being deleted
@@ -4131,11 +4130,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			return RET_PF_EMULATE;
 	}

-	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true,
 					  fault->write, &fault->map_writable,
 					  &fault->hva);
-	if (!async)
+	if (fault->pfn != KVM_PFN_ERR_NEEDS_IO)
 		return RET_PF_CONTINUE; /* *pfn has correct page already */

 	if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
@@ -4149,7 +4147,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}

-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
+	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false,
 					  fault->write, &fault->map_writable,
 					  &fault->hva);
 	return RET_PF_CONTINUE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3554e48406e4..ecd5f686d33a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -96,6 +96,7 @@
 #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
+#define KVM_PFN_ERR_NEEDS_IO	(KVM_PFN_ERR_MASK + 3)

 /*
  * error pfns indicate that the gfn is in slot but faild to
@@ -1146,7 +1147,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       bool atomic, bool no_wait, bool write_fault,
 			       bool *writable, hva_t *hva);

 void kvm_release_pfn_clean(kvm_pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 45188d11812c..6b63aa5fa5ed 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2497,7 +2497,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
  * The slow path to get the pfn of the specified host virtual address,
  * 1 indicates success, -errno is returned if error is detected.
  */
-static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
+static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault,
 			   bool *writable, kvm_pfn_t *pfn)
 {
 	unsigned int flags = FOLL_HWPOISON;
@@ -2511,7 +2511,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,

 	if (write_fault)
 		flags |= FOLL_WRITE;
-	if (async)
+	if (no_wait)
 		flags |= FOLL_NOWAIT;

 	npages = get_user_pages_unlocked(addr, 1, &page, flags);
@@ -2619,28 +2619,31 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
 }

 /*
- * Pin guest page in memory and return its pfn.
+ * Get the host pfn for a given host virtual address.  If a pfn is found and is
+ * backed by a refcounted struct page, the caller is responsible for putting
+ * the reference, i.e. this returns with an elevated refcount.
+ *
  * @addr: host virtual address which maps memory to the guest
- * @atomic: whether this function can sleep
- * @async: whether this function need to wait IO complete if the
- *         host page is not in the memory
- * @write_fault: whether we should get a writable host page
- * @writable: whether it allows to map a writable host page for !@write_fault
- *
- * The function will map a writable host page for these two cases:
- * 1): @write_fault = true
- * 2): @write_fault = false && @writable, @writable will tell the caller
- *     whether the mapping is writable.
+ * @atomic:  if true, do not sleep (effectively means "fast gup only")
+ * @no_wait: if true, do not wait for IO to complete if the host page is not in
+ *	     memory, e.g. is swapped out or not yet transfered during post-copy
+ * @write_fault: if true, a writable mapping is _required_
+ * @writable: if non-NULL, a writable mapping is _allowed_, but not required;
+ *	      set to %true (if non-NULL) and a writable host page was retrieved
  */
-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait,
 		     bool write_fault, bool *writable)
 {
 	struct vm_area_struct *vma;
 	kvm_pfn_t pfn;
 	int npages, r;

-	/* we can do it either atomically or asynchronously, not both */
-	BUG_ON(atomic && async);
+	/*
+	 * Waiting requires sleeping, and so is mutually exclusive with atomic
+	 * lookups which are not allowed to sleep.
+	 */
+	if (WARN_ON_ONCE(atomic && !no_wait))
+		return KVM_PFN_ERR_FAULT;

 	if (hva_to_pfn_fast(addr, write_fault, writable, &pfn))
 		return pfn;
@@ -2648,13 +2651,13 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	if (atomic)
 		return KVM_PFN_ERR_FAULT;

-	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
+	npages = hva_to_pfn_slow(addr, no_wait, write_fault, writable, &pfn);
 	if (npages == 1)
 		return pfn;

 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
-	      (!async && check_user_page_hwpoison(addr))) {
+	    (!no_wait && check_user_page_hwpoison(addr))) {
 		pfn = KVM_PFN_ERR_HWPOISON;
 		goto exit;
 	}
@@ -2671,9 +2674,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 		if (r < 0)
 			pfn = KVM_PFN_ERR_FAULT;
 	} else {
-		if (async && vma_is_valid(vma, write_fault))
-			*async = true;
-		pfn = KVM_PFN_ERR_FAULT;
+		if (no_wait && vma_is_valid(vma, write_fault))
+			pfn = KVM_PFN_ERR_NEEDS_IO;
+		else
+			pfn = KVM_PFN_ERR_FAULT;
 	}
 exit:
 	mmap_read_unlock(current->mm);
@@ -2681,7 +2685,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 }

 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
-			       bool atomic, bool *async, bool write_fault,
+			       bool atomic, bool no_wait, bool write_fault,
 			       bool *writable, hva_t *hva)
 {
 	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
@@ -2707,28 +2711,27 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 		writable = NULL;
 	}

-	return hva_to_pfn(addr, atomic, async, write_fault,
-			  writable);
+	return hva_to_pfn(addr, atomic, no_wait, write_fault, writable);
 }
 EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);

 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
 		      bool *writable)
 {
-	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
+	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false,
 				    write_fault, writable, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);

 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);

 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
 {
-	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
+	return __gfn_to_pfn_memslot(slot, gfn, true, false, true, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);

diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 41da467d99c9..40e87b4b4629 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -24,7 +24,7 @@
 #define KVM_MMU_READ_UNLOCK(kvm)	spin_unlock(&(kvm)->mmu_lock)
 #endif /* KVM_HAVE_MMU_RWLOCK */

-kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
+kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait,
 		     bool write_fault, bool *writable);

 #ifdef CONFIG_HAVE_KVM_PFNCACHE
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index ab519f72f2cd..6a05d6d0fbe9 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -181,7 +181,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 		}

 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			goto out_error;


base-commit: 4284f0063c48fc3734b0bedb023702c4d606732f
--


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-23 20:29       ` Sean Christopherson
@ 2022-06-23 21:29         ` Peter Xu
  2022-06-23 21:52           ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-23 21:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote:
> This is what I came up with for splitting @async into a pure input (no_wait) and
> a return value (KVM_PFN_ERR_NEEDS_IO).

The attached patch looks good to me.  It's just that..

[...]

>  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> -			       bool atomic, bool *async, bool write_fault,
> +			       bool atomic, bool no_wait, bool write_fault,
>  			       bool *writable, hva_t *hva)

.. with this patch on top we'll have 3 booleans already.  With the new one
to add separated as suggested then it'll hit 4.

Let's say one day we'll have that struct, but.. are you sure you think
keeping four booleans around is nicer than having a flag, no matter whether
we'd like to have a struct or not?

  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
			       bool atomic, bool no_wait, bool write_fault,
                               bool interruptible, bool *writable, hva_t *hva);

What if the booleans goes to 5, 6, or more?

/me starts to wonder what'll be the magic number that we'll start to think
a bitmask flag will be more lovely here. :)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-23 21:29         ` Peter Xu
@ 2022-06-23 21:52           ` Sean Christopherson
  2022-06-27 19:12             ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 21:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022, Peter Xu wrote:
> On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote:
> > This is what I came up with for splitting @async into a pure input (no_wait) and
> > a return value (KVM_PFN_ERR_NEEDS_IO).
> 
> The attached patch looks good to me.  It's just that..
> 
> [...]
> 
> >  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> > -			       bool atomic, bool *async, bool write_fault,
> > +			       bool atomic, bool no_wait, bool write_fault,
> >  			       bool *writable, hva_t *hva)
> 
> .. with this patch on top we'll have 3 booleans already.  With the new one
> to add separated as suggested then it'll hit 4.
> 
> Let's say one day we'll have that struct, but.. are you sure you think
> keeping four booleans around is nicer than having a flag, no matter whether
> we'd like to have a struct or not?

No.

>   kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> 			       bool atomic, bool no_wait, bool write_fault,
>                                bool interruptible, bool *writable, hva_t *hva);
> 
> What if the booleans goes to 5, 6, or more?
> 
> /me starts to wonder what'll be the magic number that we'll start to think
> a bitmask flag will be more lovely here. :)

For the number to really matter, it'd have to be comically large, e.g. 100+.  This
is all on-stack memory, so it's as close to free as can we can get.  Overhead in
terms of (un)marshalling is likely a wash for flags versus bools.  Bools pack in
nicely, so until there are a _lot_ of bools, memory is a non-issue.

That leaves readability, which isn't dependent on the number so much as it is on
the usage, and will be highly subjective based on the final code.

In other words, I'm not dead set against flags, but I would like to see a complete
cleanup before making a decision.  My gut reaction is to use bools, as it makes
consumption cleaner in most cases, e.g.

	if (!(xxx->write_fault || writable))
		return false;

versus

	if (!((xxx->flags & KVM_GTP_WRITE) || writable))
		return false;

but again I'm not going to say never until I actually see the end result.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-23 21:52           ` Sean Christopherson
@ 2022-06-27 19:12             ` John Hubbard
  0 siblings, 0 replies; 39+ messages in thread
From: John Hubbard @ 2022-06-27 19:12 UTC (permalink / raw)
  To: Sean Christopherson, Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On 6/23/22 14:52, Sean Christopherson wrote:
> On Thu, Jun 23, 2022, Peter Xu wrote:
>> On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote:
>>> This is what I came up with for splitting @async into a pure input (no_wait) and
>>> a return value (KVM_PFN_ERR_NEEDS_IO).
>>
>> The attached patch looks good to me.  It's just that..
>>
>> [...]
>>
>>>   kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
>>> -			       bool atomic, bool *async, bool write_fault,
>>> +			       bool atomic, bool no_wait, bool write_fault,
>>>   			       bool *writable, hva_t *hva)
>>
>> .. with this patch on top we'll have 3 booleans already.  With the new one
>> to add separated as suggested then it'll hit 4.
>>
>> Let's say one day we'll have that struct, but.. are you sure you think
>> keeping four booleans around is nicer than having a flag, no matter whether
>> we'd like to have a struct or not?
> 
> No.
> 
>>    kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
>> 			       bool atomic, bool no_wait, bool write_fault,
>>                                 bool interruptible, bool *writable, hva_t *hva);
>>
>> What if the booleans goes to 5, 6, or more?
>>
>> /me starts to wonder what'll be the magic number that we'll start to think
>> a bitmask flag will be more lovely here. :)
> 
> For the number to really matter, it'd have to be comically large, e.g. 100+.  This
> is all on-stack memory, so it's as close to free as can we can get.  Overhead in
> terms of (un)marshalling is likely a wash for flags versus bools.  Bools pack in
> nicely, so until there are a _lot_ of bools, memory is a non-issue.

It's pretty unusual to see that claim, in kernel mm code. :) Flags are often
used, because they take less space than booleans, and C bitfields have other
problems.

> 
> That leaves readability, which isn't dependent on the number so much as it is on
> the usage, and will be highly subjective based on the final code.
> 
> In other words, I'm not dead set against flags, but I would like to see a complete
> cleanup before making a decision.  My gut reaction is to use bools, as it makes
> consumption cleaner in most cases, e.g.
> 
> 	if (!(xxx->write_fault || writable))
> 		return false;
> 
> versus
> 
> 	if (!((xxx->flags & KVM_GTP_WRITE) || writable))
> 		return false;
> 
> but again I'm not going to say never until I actually see the end result.
> 

Just to add a light counter-argument: the readability is similar enough that
I think the compactness in memory makes flags a little better. imho anyway.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
  2022-06-23 14:49   ` Sean Christopherson
@ 2022-06-28  2:17   ` John Hubbard
  2022-06-28 19:46     ` Peter Xu
  1 sibling, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-28  2:17 UTC (permalink / raw)
  To: Peter Xu, kvm, linux-kernel
  Cc: Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

On 6/22/22 14:36, Peter Xu wrote:
> Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
> __gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
> for new boolean to be added to __gfn_to_pfn_memslot().
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   arch/arm64/kvm/mmu.c                   |  5 ++--
>   arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
>   arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
>   arch/x86/kvm/mmu/mmu.c                 | 10 +++----
>   include/linux/kvm_host.h               |  9 ++++++-
>   virt/kvm/kvm_main.c                    | 37 +++++++++++++++-----------
>   virt/kvm/kvm_mm.h                      |  6 +++--
>   virt/kvm/pfncache.c                    |  2 +-
>   8 files changed, 49 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f5651a05b6a8..ce1edb512b4e 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	 */
>   	smp_rmb();
>   
> -	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> -				   write_fault, &writable, NULL);
> +	pfn = __gfn_to_pfn_memslot(memslot, gfn,
> +				   write_fault ? KVM_GTP_WRITE : 0,
> +				   NULL, &writable, NULL);
>   	if (pfn == KVM_PFN_ERR_HWPOISON) {
>   		kvm_send_hwpoison_signal(hva, vma_shift);
>   		return 0;
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 514fd45c1994..e2769d58dd87 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
>   		write_ok = true;
>   	} else {
>   		/* Call KVM generic code to do the slow-path check */
> -		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> -					   writing, &write_ok, NULL);
> +		pfn = __gfn_to_pfn_memslot(memslot, gfn,
> +					   writing ? KVM_GTP_WRITE : 0,
> +					   NULL, &write_ok, NULL);
>   		if (is_error_noslot_pfn(pfn))
>   			return -EFAULT;
>   		page = NULL;
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 42851c32ff3b..232b17c75b83 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
>   		unsigned long pfn;
>   
>   		/* Call KVM generic code to do the slow-path check */
> -		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> -					   writing, upgrade_p, NULL);
> +		pfn = __gfn_to_pfn_memslot(memslot, gfn,
> +					   writing ? KVM_GTP_WRITE : 0,
> +					   NULL, upgrade_p, NULL);
>   		if (is_error_noslot_pfn(pfn))
>   			return -EFAULT;
>   		page = NULL;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f4653688fa6d..e92f1ab63d6a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>   
>   static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   {
> +	kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0;
>   	struct kvm_memory_slot *slot = fault->slot;
>   	bool async;
>   
> @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   	}
>   
>   	async = false;
> -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> -					  fault->write, &fault->map_writable,
> +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
> +					  &async, &fault->map_writable,
>   					  &fault->hva);
>   	if (!async)
>   		return RET_PF_CONTINUE; /* *pfn has correct page already */
> @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>   		}
>   	}
>   
> -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
> -					  fault->write, &fault->map_writable,
> -					  &fault->hva);
> +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
> +					  &fault->map_writable, &fault->hva);

The flags arg does improve the situation, yes.

>   	return RET_PF_CONTINUE;
>   }
>   
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c20f2d55840c..b646b6fcaec6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
>   		      bool *writable);
>   kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
>   kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
> +
> +/* gfn_to_pfn (gtp) flags */
> +typedef unsigned int __bitwise kvm_gtp_flag_t;

A minor naming problem: GTP and especially gtp_flags is way too close
to gfp_flags. It will make people either wonder if it's a typo, or
worse, *assume* that it's a typo. :)

Yes, "read the code", but if you can come up with a better TLA than GTP
here, let's consider using it.

Overall, the change looks like an improvement, even though

     write_fault ? KVM_GTP_WRITE : 0

is not wonderful. But improving *that* leads to a a big pile of diffs
that are rather beyond the scope here.


thanks,
-- 
John Hubbard
NVIDIA


> +
> +#define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
> +#define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
> +
>   kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> -			       bool atomic, bool *async, bool write_fault,
> +			       kvm_gtp_flag_t gtp_flags, bool *async,
>   			       bool *writable, hva_t *hva);
>   
>   void kvm_release_pfn_clean(kvm_pfn_t pfn);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 64ec2222a196..952400b42ee9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2444,9 +2444,11 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
>    * The slow path to get the pfn of the specified host virtual address,
>    * 1 indicates success, -errno is returned if error is detected.
>    */
> -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
> +static int hva_to_pfn_slow(unsigned long addr, bool *async,
> +			   kvm_gtp_flag_t gtp_flags,
>   			   bool *writable, kvm_pfn_t *pfn)
>   {
> +	bool write_fault = gtp_flags & KVM_GTP_WRITE;
>   	unsigned int flags = FOLL_HWPOISON;
>   	struct page *page;
>   	int npages = 0;
> @@ -2565,20 +2567,22 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
>   /*
>    * Pin guest page in memory and return its pfn.
>    * @addr: host virtual address which maps memory to the guest
> - * @atomic: whether this function can sleep
> + * @gtp_flags: kvm_gtp_flag_t flags (atomic, write, ..)
>    * @async: whether this function need to wait IO complete if the
>    *         host page is not in the memory
> - * @write_fault: whether we should get a writable host page
>    * @writable: whether it allows to map a writable host page for !@write_fault
>    *
> - * The function will map a writable host page for these two cases:
> + * The function will map a writable (KVM_GTP_WRITE set) host page for these
> + * two cases:
>    * 1): @write_fault = true
>    * 2): @write_fault = false && @writable, @writable will tell the caller
>    *     whether the mapping is writable.
>    */
> -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> -		     bool write_fault, bool *writable)
> +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
> +		     bool *writable)
>   {
> +	bool write_fault = gtp_flags & KVM_GTP_WRITE;
> +	bool atomic = gtp_flags & KVM_GTP_ATOMIC;
>   	struct vm_area_struct *vma;
>   	kvm_pfn_t pfn = 0;
>   	int npages, r;
> @@ -2592,7 +2596,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>   	if (atomic)
>   		return KVM_PFN_ERR_FAULT;
>   
> -	npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn);
> +	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
>   	if (npages == 1)
>   		return pfn;
>   
> @@ -2625,10 +2629,11 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
>   }
>   
>   kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
> -			       bool atomic, bool *async, bool write_fault,
> +			       kvm_gtp_flag_t gtp_flags, bool *async,
>   			       bool *writable, hva_t *hva)
>   {
> -	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);
> +	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL,
> +					       gtp_flags & KVM_GTP_WRITE);
>   
>   	if (hva)
>   		*hva = addr;
> @@ -2651,28 +2656,30 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
>   		writable = NULL;
>   	}
>   
> -	return hva_to_pfn(addr, atomic, async, write_fault,
> -			  writable);
> +	return hva_to_pfn(addr, gtp_flags, async, writable);
>   }
>   EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);
>   
>   kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
>   		      bool *writable)
>   {
> -	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL,
> -				    write_fault, writable, NULL);
> +	return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn,
> +				    write_fault ? KVM_GTP_WRITE : 0,
> +				    NULL, writable, NULL);
>   }
>   EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);
>   
>   kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
>   {
> -	return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL);
> +	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE,
> +				    NULL, NULL, NULL);
>   }
>   EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot);
>   
>   kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn)
>   {
> -	return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL);
> +	return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE | KVM_GTP_ATOMIC,
> +				    NULL, NULL, NULL);
>   }
>   EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic);
>   
> diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
> index 41da467d99c9..1c870911eb48 100644
> --- a/virt/kvm/kvm_mm.h
> +++ b/virt/kvm/kvm_mm.h
> @@ -3,6 +3,8 @@
>   #ifndef __KVM_MM_H__
>   #define __KVM_MM_H__ 1
>   
> +#include <linux/kvm_host.h>
> +
>   /*
>    * Architectures can choose whether to use an rwlock or spinlock
>    * for the mmu_lock.  These macros, for use in common code
> @@ -24,8 +26,8 @@
>   #define KVM_MMU_READ_UNLOCK(kvm)	spin_unlock(&(kvm)->mmu_lock)
>   #endif /* KVM_HAVE_MMU_RWLOCK */
>   
> -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> -		     bool write_fault, bool *writable);
> +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
> +		     bool *writable);
>   
>   #ifdef CONFIG_HAVE_KVM_PFNCACHE
>   void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> index dd84676615f1..0f9f6b5d2fbb 100644
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
>   		smp_rmb();
>   
>   		/* We always request a writeable mapping */
> -		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
> +		new_pfn = hva_to_pfn(uhva, KVM_GTP_WRITE, NULL, NULL);
>   		if (is_error_noslot_pfn(new_pfn))
>   			break;
>   


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-28  2:17   ` John Hubbard
@ 2022-06-28 19:46     ` Peter Xu
  2022-06-28 21:52       ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-28 19:46 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Mon, Jun 27, 2022 at 07:17:09PM -0700, John Hubbard wrote:
> On 6/22/22 14:36, Peter Xu wrote:
> > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for
> > __gfn_to_pfn_memslot().  This cleans the parameter lists, and also prepare
> > for new boolean to be added to __gfn_to_pfn_memslot().
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   arch/arm64/kvm/mmu.c                   |  5 ++--
> >   arch/powerpc/kvm/book3s_64_mmu_hv.c    |  5 ++--
> >   arch/powerpc/kvm/book3s_64_mmu_radix.c |  5 ++--
> >   arch/x86/kvm/mmu/mmu.c                 | 10 +++----
> >   include/linux/kvm_host.h               |  9 ++++++-
> >   virt/kvm/kvm_main.c                    | 37 +++++++++++++++-----------
> >   virt/kvm/kvm_mm.h                      |  6 +++--
> >   virt/kvm/pfncache.c                    |  2 +-
> >   8 files changed, 49 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index f5651a05b6a8..ce1edb512b4e 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >   	 */
> >   	smp_rmb();
> > -	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> > -				   write_fault, &writable, NULL);
> > +	pfn = __gfn_to_pfn_memslot(memslot, gfn,
> > +				   write_fault ? KVM_GTP_WRITE : 0,
> > +				   NULL, &writable, NULL);
> >   	if (pfn == KVM_PFN_ERR_HWPOISON) {
> >   		kvm_send_hwpoison_signal(hva, vma_shift);
> >   		return 0;
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > index 514fd45c1994..e2769d58dd87 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> > @@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu,
> >   		write_ok = true;
> >   	} else {
> >   		/* Call KVM generic code to do the slow-path check */
> > -		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> > -					   writing, &write_ok, NULL);
> > +		pfn = __gfn_to_pfn_memslot(memslot, gfn,
> > +					   writing ? KVM_GTP_WRITE : 0,
> > +					   NULL, &write_ok, NULL);
> >   		if (is_error_noslot_pfn(pfn))
> >   			return -EFAULT;
> >   		page = NULL;
> > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > index 42851c32ff3b..232b17c75b83 100644
> > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > @@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
> >   		unsigned long pfn;
> >   		/* Call KVM generic code to do the slow-path check */
> > -		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> > -					   writing, upgrade_p, NULL);
> > +		pfn = __gfn_to_pfn_memslot(memslot, gfn,
> > +					   writing ? KVM_GTP_WRITE : 0,
> > +					   NULL, upgrade_p, NULL);
> >   		if (is_error_noslot_pfn(pfn))
> >   			return -EFAULT;
> >   		page = NULL;
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index f4653688fa6d..e92f1ab63d6a 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
> >   static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >   {
> > +	kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0;
> >   	struct kvm_memory_slot *slot = fault->slot;
> >   	bool async;
> > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >   	}
> >   	async = false;
> > -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> > -					  fault->write, &fault->map_writable,
> > +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags,
> > +					  &async, &fault->map_writable,
> >   					  &fault->hva);
> >   	if (!async)
> >   		return RET_PF_CONTINUE; /* *pfn has correct page already */
> > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >   		}
> >   	}
> > -	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
> > -					  fault->write, &fault->map_writable,
> > -					  &fault->hva);
> > +	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
> > +					  &fault->map_writable, &fault->hva);
> 
> The flags arg does improve the situation, yes.

Thanks for supporting a flag's existance. :)

I'd say ultimately it could be a personal preference thing when the struct
comes.

> 
> >   	return RET_PF_CONTINUE;
> >   }
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c20f2d55840c..b646b6fcaec6 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
> >   		      bool *writable);
> >   kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn);
> >   kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn);
> > +
> > +/* gfn_to_pfn (gtp) flags */
> > +typedef unsigned int __bitwise kvm_gtp_flag_t;
> 
> A minor naming problem: GTP and especially gtp_flags is way too close
> to gfp_flags. It will make people either wonder if it's a typo, or
> worse, *assume* that it's a typo. :)

I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a
bit close :)

> 
> Yes, "read the code", but if you can come up with a better TLA than GTP
> here, let's consider using it.

Could I ask what's TLA?  Any suggestions on the abbrev, btw?

> 
> Overall, the change looks like an improvement, even though
> 
>     write_fault ? KVM_GTP_WRITE : 0
> 
> is not wonderful. But improving *that* leads to a a big pile of diffs
> that are rather beyond the scope here.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-28 19:46     ` Peter Xu
@ 2022-06-28 21:52       ` John Hubbard
  2022-06-28 22:50         ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-28 21:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/28/22 12:46, Peter Xu wrote:
> I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a
> bit close :)
> 
>>
>> Yes, "read the code", but if you can come up with a better TLA than GTP
>> here, let's consider using it.
> 
> Could I ask what's TLA?  Any suggestions on the abbrev, btw?

"Three-Letter Acronym". I love "TLA" because the very fact that one has
to ask what it means, shows why using them makes it harder to communicate. :)

As for alternatives, here I'll demonstrate that "GTP" actually is probably
better than anything else I can come up with, heh. Brainstorming:

     * GPN ("Guest pfn to pfn")
     * GTPN (similar, but with a "T" for "to")
     * GFNC ("guest frame number conversion")

ugggh. :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-28 21:52       ` John Hubbard
@ 2022-06-28 22:50         ` Peter Xu
  2022-06-28 22:55           ` John Hubbard
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-28 22:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Tue, Jun 28, 2022 at 02:52:04PM -0700, John Hubbard wrote:
> On 6/28/22 12:46, Peter Xu wrote:
> > I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a
> > bit close :)
> > 
> > > 
> > > Yes, "read the code", but if you can come up with a better TLA than GTP
> > > here, let's consider using it.
> > 
> > Could I ask what's TLA?  Any suggestions on the abbrev, btw?
> 
> "Three-Letter Acronym". I love "TLA" because the very fact that one has
> to ask what it means, shows why using them makes it harder to communicate. :)

Ha!

> 
> As for alternatives, here I'll demonstrate that "GTP" actually is probably
> better than anything else I can come up with, heh. Brainstorming:
> 
>     * GPN ("Guest pfn to pfn")
>     * GTPN (similar, but with a "T" for "to")
>     * GFNC ("guest frame number conversion")

Always a challenge on the naming kongfu. :-D

One good thing on using TLA in macros, flags and codes (rather than simply
mention some three letters): we can easily jump to the label of any of the
flags when we want to figure out what it means, and logically there'll (and
should) be explanations of the abbrev in the headers if it's a good header.

Example: it's even not easy to figure out what GFP is in GFP_KERNEL flag
for someone not familiar with mm (when I wrote this line, I got lost!), but
when that happens we do jump label and at the entry of gfp.h we'll see:

 * ...  The GFP acronym stands for get_free_pages(),

And if you see the patch I actually did something similar (in kvm_host.h):

 /* gfn_to_pfn (gtp) flags */
 ...

I'd still go for GTP, but let me know if you think any of the above is
better, I can switch.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-28 22:50         ` Peter Xu
@ 2022-06-28 22:55           ` John Hubbard
  2022-06-28 23:02             ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: John Hubbard @ 2022-06-28 22:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On 6/28/22 15:50, Peter Xu wrote:
> And if you see the patch I actually did something similar (in kvm_host.h):
> 
>   /* gfn_to_pfn (gtp) flags */
>   ...
> 
> I'd still go for GTP, but let me know if you think any of the above is
> better, I can switch.
> 

Yeah, I'll leave that call up to you. I don't have anything better. :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot()
  2022-06-28 22:55           ` John Hubbard
@ 2022-06-28 23:02             ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-28 23:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List, Sean Christopherson

On Tue, Jun 28, 2022 at 03:55:22PM -0700, John Hubbard wrote:
> On 6/28/22 15:50, Peter Xu wrote:
> > And if you see the patch I actually did something similar (in kvm_host.h):
> > 
> >   /* gfn_to_pfn (gtp) flags */
> >   ...
> > 
> > I'd still go for GTP, but let me know if you think any of the above is
> > better, I can switch.
> > 
> 
> Yeah, I'll leave that call up to you. I don't have anything better. :)

Thanks. I'll also give it a shot with Sean's suggestion on struct when
repost (I doubt whether I can bare with 4 bools after all..).  If that goes
well we don't worry this too since there'll be no flag introduced.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR
  2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
  2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
  2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
@ 2022-06-22 21:36 ` Peter Xu
  2022-06-23 14:31   ` Sean Christopherson
  2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
  3 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

Add one new PFN error type to show when we cannot finish fetching the PFN
due to interruptions.  For example, by receiving a generic signal.

This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
SIGIPI) even during e.g. handling an userfaultfd page fault.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/kvm_host.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b646b6fcaec6..4f84a442f67f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -96,6 +96,7 @@
 #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
 #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
 #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
+#define KVM_PFN_ERR_INTR	(KVM_PFN_ERR_MASK + 3)
 
 /*
  * error pfns indicate that the gfn is in slot but faild to
@@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn)
 	return !!(pfn & KVM_PFN_ERR_MASK);
 }
 
+/*
+ * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during
+ * fetching the PFN (e.g. a signal might have arrived), so we may want to
+ * retry at some later point and kick the userspace to handle the signal.
+ */
+static inline bool is_intr_pfn(kvm_pfn_t pfn)
+{
+	return pfn == KVM_PFN_ERR_INTR;
+}
+
 /*
  * error_noslot pfns indicate that the gfn can not be
  * translated to pfn - it is not in slot or failed to
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR
  2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
@ 2022-06-23 14:31   ` Sean Christopherson
  2022-06-23 19:32     ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 14:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Wed, Jun 22, 2022, Peter Xu wrote:
> Add one new PFN error type to show when we cannot finish fetching the PFN
> due to interruptions.  For example, by receiving a generic signal.
> 
> This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
> SIGIPI) even during e.g. handling an userfaultfd page fault.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/kvm_host.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b646b6fcaec6..4f84a442f67f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -96,6 +96,7 @@
>  #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
>  #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
>  #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
> +#define KVM_PFN_ERR_INTR	(KVM_PFN_ERR_MASK + 3)
>  
>  /*
>   * error pfns indicate that the gfn is in slot but faild to
> @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn)
>  	return !!(pfn & KVM_PFN_ERR_MASK);
>  }
>  
> +/*
> + * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during
> + * fetching the PFN (e.g. a signal might have arrived), so we may want to
> + * retry at some later point and kick the userspace to handle the signal.
> + */
> +static inline bool is_intr_pfn(kvm_pfn_t pfn)
> +{
> +	return pfn == KVM_PFN_ERR_INTR;

What about is_sigpending_pfn() and KVM_PFN_ERR_SIGPENDING?  "intr" is too close to
a real thing KVM will encounter, and I think knowing that KVM is effectively
responding to a pending signal is the most important detail for KVM developers
encountering this code for this first time.  E.g. from KVM_PFN_ERR_INTR alone, one
might think that any interrupt during GUP will trigger this.

> +}
> +
>  /*
>   * error_noslot pfns indicate that the gfn can not be
>   * translated to pfn - it is not in slot or failed to
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR
  2022-06-23 14:31   ` Sean Christopherson
@ 2022-06-23 19:32     ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-23 19:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022 at 02:31:47PM +0000, Sean Christopherson wrote:
> On Wed, Jun 22, 2022, Peter Xu wrote:
> > Add one new PFN error type to show when we cannot finish fetching the PFN
> > due to interruptions.  For example, by receiving a generic signal.
> > 
> > This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
> > SIGIPI) even during e.g. handling an userfaultfd page fault.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/linux/kvm_host.h | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b646b6fcaec6..4f84a442f67f 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -96,6 +96,7 @@
> >  #define KVM_PFN_ERR_FAULT	(KVM_PFN_ERR_MASK)
> >  #define KVM_PFN_ERR_HWPOISON	(KVM_PFN_ERR_MASK + 1)
> >  #define KVM_PFN_ERR_RO_FAULT	(KVM_PFN_ERR_MASK + 2)
> > +#define KVM_PFN_ERR_INTR	(KVM_PFN_ERR_MASK + 3)
> >  
> >  /*
> >   * error pfns indicate that the gfn is in slot but faild to
> > @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn)
> >  	return !!(pfn & KVM_PFN_ERR_MASK);
> >  }
> >  
> > +/*
> > + * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during
> > + * fetching the PFN (e.g. a signal might have arrived), so we may want to
> > + * retry at some later point and kick the userspace to handle the signal.
> > + */
> > +static inline bool is_intr_pfn(kvm_pfn_t pfn)
> > +{
> > +	return pfn == KVM_PFN_ERR_INTR;
> 
> What about is_sigpending_pfn() and KVM_PFN_ERR_SIGPENDING?  "intr" is too close to
> a real thing KVM will encounter, and I think knowing that KVM is effectively
> responding to a pending signal is the most important detail for KVM developers
> encountering this code for this first time.  E.g. from KVM_PFN_ERR_INTR alone, one
> might think that any interrupt during GUP will trigger this.

Sounds good; INTR could be too general for KVM indeed.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
                   ` (2 preceding siblings ...)
  2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
@ 2022-06-22 21:36 ` Peter Xu
  2022-06-23 14:46   ` Sean Christopherson
  3 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand,
	Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List,
	Sean Christopherson

All the facilities should be ready for this, what we need to do is to add a
new KVM_GTP_INTERRUPTIBLE flag showing that we're willing to be interrupted
by common signals during the __gfn_to_pfn_memslot() request, and wire it up
with a FOLL_INTERRUPTIBLE flag that we've just introduced.

Note that only x86 slow page fault routine will set this new bit.  The new
bit is not used in non-x86 arch or on other gup paths even for x86.
However it can actually be used elsewhere too but not yet covered.

When we see the PFN fetching was interrupted, do early exit to userspace
with an KVM_EXIT_INTR exit reason.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c   | 9 +++++++++
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c      | 4 ++++
 3 files changed, 14 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e92f1ab63d6a..b39acb7cb16d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
 static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			       unsigned int access)
 {
+	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
+	if (unlikely(is_intr_pfn(fault->pfn))) {
+		vcpu->run->exit_reason = KVM_EXIT_INTR;
+		++vcpu->stat.signal_exits;
+		return -EINTR;
+	}
+
 	/* The pfn is invalid, report the error! */
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
@@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
+	/* Allow to respond to generic signals in slow page faults */
+	flags |= KVM_GTP_INTERRUPTIBLE;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
 					  &fault->map_writable, &fault->hva);
 	return RET_PF_CONTINUE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4f84a442f67f..c8d98e435537 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1163,6 +1163,7 @@ typedef unsigned int __bitwise kvm_gtp_flag_t;
 
 #define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
 #define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
+#define  KVM_GTP_INTERRUPTIBLE  ((__force kvm_gtp_flag_t) BIT(2))
 
 kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
 			       kvm_gtp_flag_t gtp_flags, bool *async,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 952400b42ee9..b3873cac5672 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2462,6 +2462,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async,
 		flags |= FOLL_WRITE;
 	if (async)
 		flags |= FOLL_NOWAIT;
+	if (gtp_flags & KVM_GTP_INTERRUPTIBLE)
+		flags |= FOLL_INTERRUPTIBLE;
 
 	npages = get_user_pages_unlocked(addr, 1, &page, flags);
 	if (npages != 1)
@@ -2599,6 +2601,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
 	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
 	if (npages == 1)
 		return pfn;
+	if (npages == -EINTR)
+		return KVM_PFN_ERR_INTR;
 
 	mmap_read_lock(current->mm);
 	if (npages == -EHWPOISON ||
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
@ 2022-06-23 14:46   ` Sean Christopherson
  2022-06-23 19:31     ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 14:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Wed, Jun 22, 2022, Peter Xu wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e92f1ab63d6a..b39acb7cb16d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
>  static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  			       unsigned int access)
>  {
> +	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
> +	if (unlikely(is_intr_pfn(fault->pfn))) {
> +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> +		++vcpu->stat.signal_exits;
> +		return -EINTR;
> +	}
> +
>  	/* The pfn is invalid, report the error! */
>  	if (unlikely(is_error_pfn(fault->pfn)))
>  		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
> @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  		}
>  	}
>  
> +	/* Allow to respond to generic signals in slow page faults */

"slow" is being overloaded here.  The previous call __gfn_to_pfn_memslot() will
end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait.
This code really should have a more extensive comment irrespective of the interruptible
stuff, now would be a good time to add that.

Comments aside, isn't this series incomplete from the perspective that there are
still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup?  E.g. if
KVM is retrieving a page pointed at by vmcs12.

> +	flags |= KVM_GTP_INTERRUPTIBLE;
>  	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL,
>  					  &fault->map_writable, &fault->hva);
>  	return RET_PF_CONTINUE;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 4f84a442f67f..c8d98e435537 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1163,6 +1163,7 @@ typedef unsigned int __bitwise kvm_gtp_flag_t;
>  
>  #define  KVM_GTP_WRITE          ((__force kvm_gtp_flag_t) BIT(0))
>  #define  KVM_GTP_ATOMIC         ((__force kvm_gtp_flag_t) BIT(1))
> +#define  KVM_GTP_INTERRUPTIBLE  ((__force kvm_gtp_flag_t) BIT(2))
>  
>  kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn,
>  			       kvm_gtp_flag_t gtp_flags, bool *async,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 952400b42ee9..b3873cac5672 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2462,6 +2462,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async,
>  		flags |= FOLL_WRITE;
>  	if (async)
>  		flags |= FOLL_NOWAIT;
> +	if (gtp_flags & KVM_GTP_INTERRUPTIBLE)
> +		flags |= FOLL_INTERRUPTIBLE;
>  
>  	npages = get_user_pages_unlocked(addr, 1, &page, flags);
>  	if (npages != 1)
> @@ -2599,6 +2601,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async,
>  	npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn);
>  	if (npages == 1)
>  		return pfn;
> +	if (npages == -EINTR)
> +		return KVM_PFN_ERR_INTR;
>  
>  	mmap_read_lock(current->mm);
>  	if (npages == -EHWPOISON ||
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-23 14:46   ` Sean Christopherson
@ 2022-06-23 19:31     ` Peter Xu
  2022-06-23 20:07       ` Sean Christopherson
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Xu @ 2022-06-23 19:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

Hi, Sean,

On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote:
> On Wed, Jun 22, 2022, Peter Xu wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index e92f1ab63d6a..b39acb7cb16d 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> >  static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> >  			       unsigned int access)
> >  {
> > +	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
> > +	if (unlikely(is_intr_pfn(fault->pfn))) {
> > +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> > +		++vcpu->stat.signal_exits;
> > +		return -EINTR;
> > +	}
> > +
> >  	/* The pfn is invalid, report the error! */
> >  	if (unlikely(is_error_pfn(fault->pfn)))
> >  		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
> > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  		}
> >  	}
> >  
> > +	/* Allow to respond to generic signals in slow page faults */
> 
> "slow" is being overloaded here.  The previous call __gfn_to_pfn_memslot() will
> end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait.
> This code really should have a more extensive comment irrespective of the interruptible
> stuff, now would be a good time to add that.

Yes I agree, especially the "async" parameter along with "atomic" makes it
even more confusing as you said.  But isn't that also means the "slow" here
is spot-on?  I mean imho it's the "elsewhere" needs cleanup not this
comment itself since it's really stating the fact that this is the real
slow path?

Or any other suggestions greatly welcomed on how I should improve this
comment.

> 
> Comments aside, isn't this series incomplete from the perspective that there are
> still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup?  E.g. if
> KVM is retrieving a page pointed at by vmcs12.

Right.  The thing is I'm not confident I can make it complete easily in one
shot..

I mentioned some of that in cover letter or commit message of patch 1, in
that I don't think all the gup call sites are okay with being interrupted
by a non-fatal signal.

So what I want to do is doing it step by step, at least by introducing
FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid
use case.  I'm also pretty sure the page fault path is really the most
cases that will happen with GUP, so it already helps in many ways for me
when running with a patched kernel.

So when the complete picture is non-trivial to achieve in one shot, I think
this could be one option we go for.  With the facility (and example code on
x86 slow page fault) ready, hopefully we could start to convert many other
call sites to be signal-aware, outside page faults, or even outside x86,
because it's really a more generic problem, which I fully agree.

Does it sound reasonable to you?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-23 19:31     ` Peter Xu
@ 2022-06-23 20:07       ` Sean Christopherson
  2022-06-23 20:18         ` Peter Xu
  0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2022-06-23 20:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022, Peter Xu wrote:
> Hi, Sean,
> 
> On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote:
> > On Wed, Jun 22, 2022, Peter Xu wrote:
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index e92f1ab63d6a..b39acb7cb16d 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> > >  static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> > >  			       unsigned int access)
> > >  {
> > > +	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
> > > +	if (unlikely(is_intr_pfn(fault->pfn))) {
> > > +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> > > +		++vcpu->stat.signal_exits;
> > > +		return -EINTR;
> > > +	}
> > > +
> > >  	/* The pfn is invalid, report the error! */
> > >  	if (unlikely(is_error_pfn(fault->pfn)))
> > >  		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
> > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > >  		}
> > >  	}
> > >  
> > > +	/* Allow to respond to generic signals in slow page faults */
> > 
> > "slow" is being overloaded here.  The previous call __gfn_to_pfn_memslot() will
> > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait.
> > This code really should have a more extensive comment irrespective of the interruptible
> > stuff, now would be a good time to add that.
> 
> Yes I agree, especially the "async" parameter along with "atomic" makes it
> even more confusing as you said.  But isn't that also means the "slow" here
> is spot-on?  I mean imho it's the "elsewhere" needs cleanup not this
> comment itself since it's really stating the fact that this is the real
> slow path?

No, because atomic=false here, i.e. KVM will try hva_to_pfn_slow() if hva_to_pfn_fast()
fails.  So even if we agree that the "wait for IO" path is the true slow path,
when reading KVM code the vast, vast majority of developers will associate "slow"
with hva_to_pfn_slow().

> Or any other suggestions greatly welcomed on how I should improve this
> comment.

Something along these lines?

	/*
	 * Allow gup to bail on pending non-fatal signals when it's also allowed
	 * to wait for IO.  Note, gup always bails if it is unable to quickly
	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
	 */
> 
> > 
> > Comments aside, isn't this series incomplete from the perspective that there are
> > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup?  E.g. if
> > KVM is retrieving a page pointed at by vmcs12.
> 
> Right.  The thing is I'm not confident I can make it complete easily in one
> shot..
> 
> I mentioned some of that in cover letter or commit message of patch 1, in
> that I don't think all the gup call sites are okay with being interrupted
> by a non-fatal signal.
> 
> So what I want to do is doing it step by step, at least by introducing
> FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid
> use case.  I'm also pretty sure the page fault path is really the most
> cases that will happen with GUP, so it already helps in many ways for me
> when running with a patched kernel.
> 
> So when the complete picture is non-trivial to achieve in one shot, I think
> this could be one option we go for.  With the facility (and example code on
> x86 slow page fault) ready, hopefully we could start to convert many other
> call sites to be signal-aware, outside page faults, or even outside x86,
> because it's really a more generic problem, which I fully agree.
> 
> Does it sound reasonable to you?

Yep.  In fact, I'd be totally ok keeping this to just the page fault path.  I
missed one cruicial detail on my first read through: gup already bails on SIGKILL,
it's only these technically-not-fatal signals that gup ignores by default.  In
other words, using FOLL_INTERRUPTIBLE is purely opportunsitically as userspace
can always resort to SIGKILL if the VM really needs to die.

It would be very helpful to explicit call that out in the changelog, that way
it's (hopefully) clear that KVM uses FOLL_INTERRUPTIBLE to be user friendly when
it's easy to do so, and that it's not required for correctness/robustness.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults
  2022-06-23 20:07       ` Sean Christopherson
@ 2022-06-23 20:18         ` Peter Xu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Xu @ 2022-06-23 20:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton,
	David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli,
	Linux MM Mailing List

On Thu, Jun 23, 2022 at 08:07:00PM +0000, Sean Christopherson wrote:
> On Thu, Jun 23, 2022, Peter Xu wrote:
> > Hi, Sean,
> > 
> > On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote:
> > > On Wed, Jun 22, 2022, Peter Xu wrote:
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index e92f1ab63d6a..b39acb7cb16d 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> > > >  static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> > > >  			       unsigned int access)
> > > >  {
> > > > +	/* NOTE: not all error pfn is fatal; handle intr before the other ones */
> > > > +	if (unlikely(is_intr_pfn(fault->pfn))) {
> > > > +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> > > > +		++vcpu->stat.signal_exits;
> > > > +		return -EINTR;
> > > > +	}
> > > > +
> > > >  	/* The pfn is invalid, report the error! */
> > > >  	if (unlikely(is_error_pfn(fault->pfn)))
> > > >  		return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
> > > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > > >  		}
> > > >  	}
> > > >  
> > > > +	/* Allow to respond to generic signals in slow page faults */
> > > 
> > > "slow" is being overloaded here.  The previous call __gfn_to_pfn_memslot() will
> > > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait.
> > > This code really should have a more extensive comment irrespective of the interruptible
> > > stuff, now would be a good time to add that.
> > 
> > Yes I agree, especially the "async" parameter along with "atomic" makes it
> > even more confusing as you said.  But isn't that also means the "slow" here
> > is spot-on?  I mean imho it's the "elsewhere" needs cleanup not this
> > comment itself since it's really stating the fact that this is the real
> > slow path?
> 
> No, because atomic=false here, i.e. KVM will try hva_to_pfn_slow() if hva_to_pfn_fast()
> fails.  So even if we agree that the "wait for IO" path is the true slow path,
> when reading KVM code the vast, vast majority of developers will associate "slow"
> with hva_to_pfn_slow().

Okay.  I think how we define slow matters, here my take is "when a major
fault happens" (as defined in the mm term), but probably that definition is
a bit far away from kvm as the hypervisor level indeed.

> 
> > Or any other suggestions greatly welcomed on how I should improve this
> > comment.
> 
> Something along these lines?
> 
> 	/*
> 	 * Allow gup to bail on pending non-fatal signals when it's also allowed
> 	 * to wait for IO.  Note, gup always bails if it is unable to quickly
> 	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
> 	 */

Taken.

> > 
> > > 
> > > Comments aside, isn't this series incomplete from the perspective that there are
> > > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup?  E.g. if
> > > KVM is retrieving a page pointed at by vmcs12.
> > 
> > Right.  The thing is I'm not confident I can make it complete easily in one
> > shot..
> > 
> > I mentioned some of that in cover letter or commit message of patch 1, in
> > that I don't think all the gup call sites are okay with being interrupted
> > by a non-fatal signal.
> > 
> > So what I want to do is doing it step by step, at least by introducing
> > FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid
> > use case.  I'm also pretty sure the page fault path is really the most
> > cases that will happen with GUP, so it already helps in many ways for me
> > when running with a patched kernel.
> > 
> > So when the complete picture is non-trivial to achieve in one shot, I think
> > this could be one option we go for.  With the facility (and example code on
> > x86 slow page fault) ready, hopefully we could start to convert many other
> > call sites to be signal-aware, outside page faults, or even outside x86,
> > because it's really a more generic problem, which I fully agree.
> > 
> > Does it sound reasonable to you?
> 
> Yep.  In fact, I'd be totally ok keeping this to just the page fault path.  I
> missed one cruicial detail on my first read through: gup already bails on SIGKILL,
> it's only these technically-not-fatal signals that gup ignores by default.  In
> other words, using FOLL_INTERRUPTIBLE is purely opportunsitically as userspace
> can always resort to SIGKILL if the VM really needs to die.
> 
> It would be very helpful to explicit call that out in the changelog, that way
> it's (hopefully) clear that KVM uses FOLL_INTERRUPTIBLE to be user friendly when
> it's easy to do so, and that it's not required for correctness/robustness.

Yes that's the case, sigkill is special. I can mention that somewhere in
the cover letter too besides the comment you suggested above.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2022-07-07 15:07 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu
2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu
2022-06-25  0:35   ` Jason Gunthorpe
2022-06-25  1:23     ` Peter Xu
2022-06-25 23:59       ` Jason Gunthorpe
2022-06-27 15:29         ` Peter Xu
2022-06-28  2:07   ` John Hubbard
2022-06-28 19:31     ` Peter Xu
2022-06-28 21:40       ` John Hubbard
2022-06-28 22:33         ` Peter Xu
2022-06-29  0:31           ` John Hubbard
2022-06-29 15:47             ` Peter Xu
2022-06-30  1:53               ` John Hubbard
2022-06-30 13:49                 ` Peter Xu
2022-06-30 19:01                   ` John Hubbard
2022-06-30 21:27                     ` Peter Xu
2022-07-04 22:48   ` Matthew Wilcox
2022-07-07 15:06     ` Peter Xu
2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu
2022-06-23 14:49   ` Sean Christopherson
2022-06-23 19:46     ` Peter Xu
2022-06-23 20:29       ` Sean Christopherson
2022-06-23 21:29         ` Peter Xu
2022-06-23 21:52           ` Sean Christopherson
2022-06-27 19:12             ` John Hubbard
2022-06-28  2:17   ` John Hubbard
2022-06-28 19:46     ` Peter Xu
2022-06-28 21:52       ` John Hubbard
2022-06-28 22:50         ` Peter Xu
2022-06-28 22:55           ` John Hubbard
2022-06-28 23:02             ` Peter Xu
2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu
2022-06-23 14:31   ` Sean Christopherson
2022-06-23 19:32     ` Peter Xu
2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu
2022-06-23 14:46   ` Sean Christopherson
2022-06-23 19:31     ` Peter Xu
2022-06-23 20:07       ` Sean Christopherson
2022-06-23 20:18         ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.