* [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals @ 2022-06-22 21:36 Peter Xu 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu ` (3 more replies) 0 siblings, 4 replies; 39+ messages in thread From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw) To: kvm, linux-kernel Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson rfc->v1: - Fix non-x86 build reported by syzbot - Removing RFC tag One issue was reported that libvirt won't be able to stop the virtual machine using QMP command "stop" during a paused postcopy migration [1]. It won't work because "stop the VM" operation requires the hypervisor to kick all the vcpu threads out using SIG_IPI in QEMU (which is translated to a SIGUSR1). However since during a paused postcopy, the vcpu threads are hang death at handle_userfault() so there're simply not responding to the kicks. Further, the "stop" command will further hang the QMP channel. The mm has facility to process generic signal (FAULT_FLAG_INTERRUPTIBLE), however it's only used in the PF handlers only, not in GUP. Unluckily, KVM is a heavy GUP user on guest page faults. It means we won't be able to interrupt a long page fault for KVM fetching guest pages with what we have right now. I think it's reasonable for GUP to only listen to fatal signals, as most of the GUP users are not really ready to handle such case. But actually KVM is not such an user, and KVM actually has rich infrastructure to handle even generic signals, and properly deliver the signal to the userspace. Then the page fault can be retried in the next KVM_RUN. This patchset added FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE, and let KVM be the first one to use it. One thing to mention is that this is not allowing all KVM paths to be able to respond to non fatal signals, but only on x86 slow page faults. In the future when more code is ready for handling signal interruptions, we can explore possibility to have more gup callers using FOLL_INTERRUPTIBLE. Tests ===== I created a postcopy environment, pause the migration by shutting down the network to emulate a network failure (so the handle_userfault() will stuck for a long time), then I tried three things: (1) Sending QMP command "stop" to QEMU monitor, (2) Hitting Ctrl-C from QEMU cmdline, (3) GDB attach to the dest QEMU process. Before this patchset, all three use case hang. After the patchset, all work just like when there's not network failure at all. Please have a look, thanks. [1] https://gitlab.com/qemu-project/qemu/-/issues/1052 Peter Xu (4): mm/gup: Add FOLL_INTERRUPTIBLE kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() kvm: Add new pfn error KVM_PFN_ERR_INTR kvm/x86: Allow to respond to generic signals during slow page faults arch/arm64/kvm/mmu.c | 5 ++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 ++-- arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 ++-- arch/x86/kvm/mmu/mmu.c | 19 ++++++++---- include/linux/kvm_host.h | 21 ++++++++++++- include/linux/mm.h | 1 + mm/gup.c | 33 ++++++++++++++++++--- virt/kvm/kvm_main.c | 41 ++++++++++++++++---------- virt/kvm/kvm_mm.h | 6 ++-- virt/kvm/pfncache.c | 2 +- 10 files changed, 104 insertions(+), 34 deletions(-) -- 2.32.0 ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu @ 2022-06-22 21:36 ` Peter Xu 2022-06-25 0:35 ` Jason Gunthorpe ` (2 more replies) 2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu ` (2 subsequent siblings) 3 siblings, 3 replies; 39+ messages in thread From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw) To: kvm, linux-kernel Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One issue with it is that not all GUP paths are able to handle signal delivers besides SIGKILL. That's not ideal for the GUP users who are actually able to handle these cases, like KVM. KVM uses GUP extensively on faulting guest pages, during which we've got existing infrastructures to retry a page fault at a later time. Allowing the GUP to be interrupted by generic signals can make KVM related threads to be more responsive. For examples: (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be generated to kick the vcpus out of kernel context immediately, (2) SIGINT: which can be used with interactive hypervisor users to stop a virtual machine with Ctrl-C without any delays/hangs, (3) SIGTRAP: which grants GDB capability even during page faults that are stuck for a long time. Normally hypervisor will be able to receive these signals properly, but not if we're stuck in a GUP for a long time for whatever reason. It happens easily with a stucked postcopy migration when e.g. a network temp failure happens, then some vcpu threads can hang death waiting for the pages. With the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively enable the ability to trap these signals. Signed-off-by: Peter Xu <peterx@redhat.com> --- include/linux/mm.h | 1 + mm/gup.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..ebdf8a6b86c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index 551264407624..ad74b137d363 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (locked) + if (locked) { fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + /* + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're + * (at least) killable. It also mostly means we're not + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since + * it won't make a lot of sense to be used alone. + */ + if (*flags & FOLL_INTERRUPTIBLE) + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; + } if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, } EXPORT_SYMBOL_GPL(fixup_user_fault); +/* + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is + * specified, it'll also respond to generic signals. The caller of GUP + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. + */ +static bool gup_signal_pending(unsigned int flags) +{ + if (fatal_signal_pending(current)) + return true; + + if (!(flags & FOLL_INTERRUPTIBLE)) + return false; + + return signal_pending(current); +} + /* * Please note that this function, unlike __get_user_pages will not * return 0 for nr_pages > 0 without FOLL_NOWAIT @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, * Repeat on the address that fired VM_FAULT_RETRY * with both FAULT_FLAG_ALLOW_RETRY and * FAULT_FLAG_TRIED. Note that GUP can be interrupted - * by fatal signals, so we need to check it before we + * by fatal signals of even common signals, depending on + * the caller's request. So we need to check it before we * start trying again otherwise it can loop forever. */ - - if (fatal_signal_pending(current)) { + if (gup_signal_pending(flags)) { if (!pages_done) pages_done = -EINTR; break; -- 2.32.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu @ 2022-06-25 0:35 ` Jason Gunthorpe 2022-06-25 1:23 ` Peter Xu 2022-06-28 2:07 ` John Hubbard 2022-07-04 22:48 ` Matthew Wilcox 2 siblings, 1 reply; 39+ messages in thread From: Jason Gunthorpe @ 2022-06-25 0:35 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One > issue with it is that not all GUP paths are able to handle signal delivers > besides SIGKILL. > > That's not ideal for the GUP users who are actually able to handle these > cases, like KVM. > > KVM uses GUP extensively on faulting guest pages, during which we've got > existing infrastructures to retry a page fault at a later time. Allowing > the GUP to be interrupted by generic signals can make KVM related threads > to be more responsive. For examples: > > (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, > e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be > generated to kick the vcpus out of kernel context immediately, > > (2) SIGINT: which can be used with interactive hypervisor users to stop a > virtual machine with Ctrl-C without any delays/hangs, > > (3) SIGTRAP: which grants GDB capability even during page faults that are > stuck for a long time. > > Normally hypervisor will be able to receive these signals properly, but not > if we're stuck in a GUP for a long time for whatever reason. It happens > easily with a stucked postcopy migration when e.g. a network temp failure > happens, then some vcpu threads can hang death waiting for the pages. With > the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively > enable the ability to trap these signals. Can you talk abit about what is required to use this new interface correctly? Lots of GUP callers are in simple system call contexts (like ioctl), can/should they set this flag and if so what else do they need to do? Thanks, Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-25 0:35 ` Jason Gunthorpe @ 2022-06-25 1:23 ` Peter Xu 2022-06-25 23:59 ` Jason Gunthorpe 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-25 1:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson Hi, Jason, On Fri, Jun 24, 2022 at 09:35:54PM -0300, Jason Gunthorpe wrote: > Can you talk abit about what is required to use this new interface > correctly? > > Lots of GUP callers are in simple system call contexts (like ioctl), > can/should they set this flag and if so what else do they need to do? Thanks for taking a look. IMHO the major thing required is the caller can handle the case when GUP returned (1) less page than expected, and (2) -EINTR returns. For the -EINTR case, IIUC ideally in an ioctl context we should better deliver it back to user app this -EINTR (while do cleanups gracefully), rather than returning anything else (e.g. converting it to -EFAULT or something else). But note that FAULT_FLAG_INTERRUPTIBLE is only used in an userfaultfd context (aka, userfaultfd_get_blocking_state()). For example, if we hang at lock_page() (if not go into whether hanging at lock_page makes sense or not at all.. it really sounds like a bug) and we receive a non-fatal signal, we won't be able to be scheduled for that since lock_page() uses TASK_UNINTERRUPTIBLE always. I think it's a separate problem on whether we should extend the usage of FAULT_FLAG_INTERRUPTIBLE to things like lock_page() (and probably not..), and currently it does solve a major issue regarding postcopy hanging on pages for hypervisor use case. Hopefully that still justifies this plumber work to enable the interruptible cap to GUP layer. If to go back to the original question with a shorter answer: if the ioctl context that GUP upon a page that will never be with a uffd context, then it's probably not gonna help at all.. at least not before we use FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-25 1:23 ` Peter Xu @ 2022-06-25 23:59 ` Jason Gunthorpe 2022-06-27 15:29 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: Jason Gunthorpe @ 2022-06-25 23:59 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote: > If to go back to the original question with a shorter answer: if the ioctl > context that GUP upon a page that will never be with a uffd context, then > it's probably not gonna help at all.. at least not before we use > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. I think I would be more interested in this if it could abort a swap in, for instance. Doesn't this happen if it flows the interruptible flag into the VMA's fault handler? Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-25 23:59 ` Jason Gunthorpe @ 2022-06-27 15:29 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-06-27 15:29 UTC (permalink / raw) To: Jason Gunthorpe Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Sat, Jun 25, 2022 at 08:59:04PM -0300, Jason Gunthorpe wrote: > On Fri, Jun 24, 2022 at 09:23:04PM -0400, Peter Xu wrote: > > If to go back to the original question with a shorter answer: if the ioctl > > context that GUP upon a page that will never be with a uffd context, then > > it's probably not gonna help at all.. at least not before we use > > FAULT_FLAG_INTERRUPTIBLE outside uffd page fault handling. > > I think I would be more interested in this if it could abort a swap > in, for instance. Doesn't this happen if it flows the interruptible > flag into the VMA's fault handler? The idea makes sense, but it doesn't work like that right now, afaict. We need to teach lock_page_or_retry() to be able to consume the flag as discussed. I don't see a major blocker for it if lock_page_or_retry() is the only one we'd like to touch. Say, the only caller currently is do_swap_page() (there's also remove_device_exclusive_entry() but just deeper in the stack). Looks doable so far but I'll need to think about it.. I can keep you updated if I get something, but it'll be separate from this patchset. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu 2022-06-25 0:35 ` Jason Gunthorpe @ 2022-06-28 2:07 ` John Hubbard 2022-06-28 19:31 ` Peter Xu 2022-07-04 22:48 ` Matthew Wilcox 2 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-28 2:07 UTC (permalink / raw) To: Peter Xu, kvm, linux-kernel Cc: Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/22/22 14:36, Peter Xu wrote: > We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs. One > issue with it is that not all GUP paths are able to handle signal delivers > besides SIGKILL. > > That's not ideal for the GUP users who are actually able to handle these > cases, like KVM. > > KVM uses GUP extensively on faulting guest pages, during which we've got > existing infrastructures to retry a page fault at a later time. Allowing > the GUP to be interrupted by generic signals can make KVM related threads > to be more responsive. For examples: > > (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI, > e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be > generated to kick the vcpus out of kernel context immediately, > > (2) SIGINT: which can be used with interactive hypervisor users to stop a > virtual machine with Ctrl-C without any delays/hangs, > > (3) SIGTRAP: which grants GDB capability even during page faults that are > stuck for a long time. > > Normally hypervisor will be able to receive these signals properly, but not > if we're stuck in a GUP for a long time for whatever reason. It happens > easily with a stucked postcopy migration when e.g. a network temp failure > happens, then some vcpu threads can hang death waiting for the pages. With > the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively > enable the ability to trap these signals. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/mm.h | 1 + > mm/gup.c | 33 +++++++++++++++++++++++++++++---- > 2 files changed, 30 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index bc8f326be0ce..ebdf8a6b86c1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ > #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ > #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ > +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ Perhaps, s/generic/non-fatal/ ? > > /* > * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each > diff --git a/mm/gup.c b/mm/gup.c > index 551264407624..ad74b137d363 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > fault_flags |= FAULT_FLAG_WRITE; > if (*flags & FOLL_REMOTE) > fault_flags |= FAULT_FLAG_REMOTE; > - if (locked) > + if (locked) { > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > + /* > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > + * (at least) killable. It also mostly means we're not > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > + * it won't make a lot of sense to be used alone. > + */ This comment seems a little confusing due to its location. We've just checked "locked", but the comment is talking about other constraints. Not sure what to suggest. Maybe move it somewhere else? > + if (*flags & FOLL_INTERRUPTIBLE) > + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; > + } > if (*flags & FOLL_NOWAIT) > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; > if (*flags & FOLL_TRIED) { > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, > } > EXPORT_SYMBOL_GPL(fixup_user_fault); > > +/* > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > + * specified, it'll also respond to generic signals. The caller of GUP > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > + */ > +static bool gup_signal_pending(unsigned int flags) > +{ > + if (fatal_signal_pending(current)) > + return true; > + > + if (!(flags & FOLL_INTERRUPTIBLE)) > + return false; > + > + return signal_pending(current); > +} > + OK. > /* > * Please note that this function, unlike __get_user_pages will not > * return 0 for nr_pages > 0 without FOLL_NOWAIT > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > * Repeat on the address that fired VM_FAULT_RETRY > * with both FAULT_FLAG_ALLOW_RETRY and > * FAULT_FLAG_TRIED. Note that GUP can be interrupted > - * by fatal signals, so we need to check it before we > + * by fatal signals of even common signals, depending on > + * the caller's request. So we need to check it before we > * start trying again otherwise it can loop forever. > */ > - > - if (fatal_signal_pending(current)) { > + if (gup_signal_pending(flags)) { This is new and bold. :) Signals that an application was prepared to handle can now cause gup to quit early. I wonder if that will break any use cases out there (SIGPIPE...) ? Generally, gup callers handle failures pretty well, so it's probably not too bad. But I wanted to mention the idea that handled interrupts might be a little surprising here. thanks, -- John Hubbard NVIDIA > if (!pages_done) > pages_done = -EINTR; > break; ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-28 2:07 ` John Hubbard @ 2022-06-28 19:31 ` Peter Xu 2022-06-28 21:40 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-28 19:31 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson Hi, John, Thanks for your comments! On Mon, Jun 27, 2022 at 07:07:28PM -0700, John Hubbard wrote: [...] > > @@ -2941,6 +2941,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > > #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ > > #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ > > #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ > > +#define FOLL_INTERRUPTIBLE 0x100000 /* allow interrupts from generic signals */ > > Perhaps, s/generic/non-fatal/ ? Sure. > > diff --git a/mm/gup.c b/mm/gup.c > > index 551264407624..ad74b137d363 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > > fault_flags |= FAULT_FLAG_WRITE; > > if (*flags & FOLL_REMOTE) > > fault_flags |= FAULT_FLAG_REMOTE; > > - if (locked) > > + if (locked) { > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > > + /* > > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > > + * (at least) killable. It also mostly means we're not > > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > > + * it won't make a lot of sense to be used alone. > > + */ > > This comment seems a little confusing due to its location. We've just > checked "locked", but the comment is talking about other constraints. > > Not sure what to suggest. Maybe move it somewhere else? I put it here to be after FAULT_FLAG_KILLABLE we just set. Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the key we grant "killable" attribute to this GUP. So I thought it'll be good to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" being set. > > > + if (*flags & FOLL_INTERRUPTIBLE) > > + fault_flags |= FAULT_FLAG_INTERRUPTIBLE; > > + } > > if (*flags & FOLL_NOWAIT) > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; > > if (*flags & FOLL_TRIED) { > > @@ -1322,6 +1331,22 @@ int fixup_user_fault(struct mm_struct *mm, > > } > > EXPORT_SYMBOL_GPL(fixup_user_fault); > > +/* > > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > > + * specified, it'll also respond to generic signals. The caller of GUP > > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > > + */ > > +static bool gup_signal_pending(unsigned int flags) > > +{ > > + if (fatal_signal_pending(current)) > > + return true; > > + > > + if (!(flags & FOLL_INTERRUPTIBLE)) > > + return false; > > + > > + return signal_pending(current); > > +} > > + > > OK. > > > /* > > * Please note that this function, unlike __get_user_pages will not > > * return 0 for nr_pages > 0 without FOLL_NOWAIT > > @@ -1403,11 +1428,11 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm, > > * Repeat on the address that fired VM_FAULT_RETRY > > * with both FAULT_FLAG_ALLOW_RETRY and > > * FAULT_FLAG_TRIED. Note that GUP can be interrupted > > - * by fatal signals, so we need to check it before we > > + * by fatal signals of even common signals, depending on > > + * the caller's request. So we need to check it before we > > * start trying again otherwise it can loop forever. > > */ > > - > > - if (fatal_signal_pending(current)) { > > + if (gup_signal_pending(flags)) { > > This is new and bold. :) Signals that an application was prepared to > handle can now cause gup to quit early. I wonder if that will break any > use cases out there (SIGPIPE...) ? Note: I introduced the new FOLL_INTERRUPTIBLE flag, so only if the caller explicitly passing in that flag could there be a functional change. IOW, no functional change intended for this single patch, not before I start to let KVM code passing over that flag. > > Generally, gup callers handle failures pretty well, so it's probably > not too bad. But I wanted to mention the idea that handled interrupts > might be a little surprising here. Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't need to worry at all, IMHO, because it should really work exactly like before, otherwise I had a bug somewhere else.. :) Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-28 19:31 ` Peter Xu @ 2022-06-28 21:40 ` John Hubbard 2022-06-28 22:33 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-28 21:40 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/28/22 12:31, Peter Xu wrote: >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 551264407624..ad74b137d363 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, >>> fault_flags |= FAULT_FLAG_WRITE; >>> if (*flags & FOLL_REMOTE) >>> fault_flags |= FAULT_FLAG_REMOTE; >>> - if (locked) >>> + if (locked) { >>> fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; >>> + /* >>> + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're >>> + * (at least) killable. It also mostly means we're not >>> + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since >>> + * it won't make a lot of sense to be used alone. >>> + */ >> >> This comment seems a little confusing due to its location. We've just >> checked "locked", but the comment is talking about other constraints. >> >> Not sure what to suggest. Maybe move it somewhere else? > > I put it here to be after FAULT_FLAG_KILLABLE we just set. > > Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the > key we grant "killable" attribute to this GUP. So I thought it'll be good > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" > being set. > The key point is the connection between "locked" and killable. If the comment explained why "locked" means "killable", that would help clear this up. The NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not clear it up either... :) >> Generally, gup callers handle failures pretty well, so it's probably >> not too bad. But I wanted to mention the idea that handled interrupts >> might be a little surprising here. > > Yes as I mentioned anyway it'll be an opt-in flag, so by default we don't > need to worry at all, IMHO, because it should really work exactly like > before, otherwise I had a bug somewhere else.. :) > Yes, that's true. OK then. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-28 21:40 ` John Hubbard @ 2022-06-28 22:33 ` Peter Xu 2022-06-29 0:31 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-28 22:33 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Tue, Jun 28, 2022 at 02:40:53PM -0700, John Hubbard wrote: > On 6/28/22 12:31, Peter Xu wrote: > > > > diff --git a/mm/gup.c b/mm/gup.c > > > > index 551264407624..ad74b137d363 100644 > > > > --- a/mm/gup.c > > > > +++ b/mm/gup.c > > > > @@ -933,8 +933,17 @@ static int faultin_page(struct vm_area_struct *vma, > > > > fault_flags |= FAULT_FLAG_WRITE; > > > > if (*flags & FOLL_REMOTE) > > > > fault_flags |= FAULT_FLAG_REMOTE; > > > > - if (locked) > > > > + if (locked) { > > > > fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; > > > > + /* > > > > + * We should only grant FAULT_FLAG_INTERRUPTIBLE when we're > > > > + * (at least) killable. It also mostly means we're not > > > > + * with NOWAIT. Otherwise ignore FOLL_INTERRUPTIBLE since > > > > + * it won't make a lot of sense to be used alone. > > > > + */ > > > > > > This comment seems a little confusing due to its location. We've just > > > checked "locked", but the comment is talking about other constraints. > > > > > > Not sure what to suggest. Maybe move it somewhere else? > > > > I put it here to be after FAULT_FLAG_KILLABLE we just set. > > > > Only if we have "locked" will we set FAULT_FLAG_KILLABLE. That's also the > > key we grant "killable" attribute to this GUP. So I thought it'll be good > > to put here because I want to have FOLL_INTERRUPTIBLE dependent on "locked" > > being set. > > > > The key point is the connection between "locked" and killable. If the comment > explained why "locked" means "killable", that would help clear this up. The > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not > clear it up either... :) Sorry to have a comment that makes it feels confusing. I tried to explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but obviously I didn't do my job well.. Maybe that NOWAIT thing adds more complexity but not even necessary. Would below one more acceptable? /* * We'll only be able to respond to signals when "locked != * NULL". When with it, we'll always respond to SIGKILL * (as implied by FAULT_FLAG_KILLABLE above), and we'll * respond to non-fatal signals only if the GUP user has * specified FOLL_INTERRUPTIBLE. */ Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-28 22:33 ` Peter Xu @ 2022-06-29 0:31 ` John Hubbard 2022-06-29 15:47 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-29 0:31 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/28/22 15:33, Peter Xu wrote: >> The key point is the connection between "locked" and killable. If the comment >> explained why "locked" means "killable", that would help clear this up. The >> NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not >> clear it up either... :) > > Sorry to have a comment that makes it feels confusing. I tried to > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but > obviously I didn't do my job well.. > > Maybe that NOWAIT thing adds more complexity but not even necessary. > > Would below one more acceptable? > > /* > * We'll only be able to respond to signals when "locked != > * NULL". When with it, we'll always respond to SIGKILL > * (as implied by FAULT_FLAG_KILLABLE above), and we'll > * respond to non-fatal signals only if the GUP user has > * specified FOLL_INTERRUPTIBLE. > */ It looks like part of this comment is trying to document a pre-existing concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE if locked != NULL. The problem I am (personally) having is that I don't yet understand why or how those are connected: what is it about having locked non-NULL that means the process is killable? (Can you explain why that is?) If that were clear, I think I could suggest a good comment wording. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-29 0:31 ` John Hubbard @ 2022-06-29 15:47 ` Peter Xu 2022-06-30 1:53 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-29 15:47 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Tue, Jun 28, 2022 at 05:31:43PM -0700, John Hubbard wrote: > On 6/28/22 15:33, Peter Xu wrote: > > > The key point is the connection between "locked" and killable. If the comment > > > explained why "locked" means "killable", that would help clear this up. The > > > NOWAIT sentence is also confusing to me, and adding "mostly NOWAIT" does not > > > clear it up either... :) > > > > Sorry to have a comment that makes it feels confusing. I tried to > > explicitly put the comment to be after setting FAULT_FLAG_KILLABLE but > > obviously I didn't do my job well.. > > > > Maybe that NOWAIT thing adds more complexity but not even necessary. > > > > Would below one more acceptable? > > > > /* > > * We'll only be able to respond to signals when "locked != > > * NULL". When with it, we'll always respond to SIGKILL > > * (as implied by FAULT_FLAG_KILLABLE above), and we'll > > * respond to non-fatal signals only if the GUP user has > > * specified FOLL_INTERRUPTIBLE. > > */ > > > It looks like part of this comment is trying to document a pre-existing > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE > if locked != NULL. I'd say that's not what I wanted to comment.. I wanted to express that INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to be after KILLABLE, not before. IMHO it makes sense already to have "interruptible" only if "killable", no matter what's the pre-requisite for KILLABLE (in this case it's having "locked" being non-null). > The problem I am (personally) having is that I don't yet understand why > or how those are connected: what is it about having locked non-NULL that > means the process is killable? (Can you explain why that is?) Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow retry at all it means we'll never wait in handle_mm_fault() anyway, then no need to worry on being interrupted by any kind of signal (fatal or not). Then if we allow retry, we need some way to know "whether mmap_sem is released or not" during the process for the caller (because the caller cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so that we can set *locked=false to tell the caller we have released mmap_sem. I think that's why we have "locked" defined as "we allow this page fault request to retry and wait, during wait we can always allow fatal signals". I think that's defined throughout the gup call interfaces too, and faultin_page() is the last step to talk to handle_mm_fault(). To make this whole picture complete, NOWAIT is another thing that relies on ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". For example, when we want to make sure no vma will be released after faultin_page() returned. > > If that were clear, I think I could suggest a good comment wording. IMHO it's a little bit weird to explain "locked" here, especially after KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd attempt. There are some comments for "locked" above the definition of faultin_page(), I think that'll be a nicer place to enrich explanations for "locked", and it seems even more suitable as a separate patch? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-29 15:47 ` Peter Xu @ 2022-06-30 1:53 ` John Hubbard 2022-06-30 13:49 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-30 1:53 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/29/22 08:47, Peter Xu wrote: >> It looks like part of this comment is trying to document a pre-existing >> concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE >> if locked != NULL. > > I'd say that's not what I wanted to comment.. I wanted to express that > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to > be after KILLABLE, not before. IMHO it makes sense already to have > "interruptible" only if "killable", no matter what's the pre-requisite for > KILLABLE (in this case it's having "locked" being non-null). > OK, I think I finally understand both the intention of the comment, and (thanks to your notes, below) the interaction between *locked and _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading me by the nose through that. The pre-existing code is abusing *locked a bit, by treating it as a flag when really it is a side effect of flags, but at least now that's clear to me. Anyway...this leads to finally getting into the comment, which I now think is not quite what we want: there is no need for a hierarchy of "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an application allows a fatal signal to get through, it's not clear to me that that implies that non-fatal signal handling should be prevented. The code is only vaguely enforcing such a thing, because it just so happens that both cases require the same basic prerequisites. So the code looks good, but I don't see a need to claim a hierarchy in the comments. So I'd either delete the comment entirely, or go with something that is doesn't try to talk about hierarchy nor locked/retry either. Does this look reasonable to you: /* * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set * FOLL_INTERRUPTIBLE. That's because some callers may not be * prepared to handle early exits caused by non-fatal signals. */ ? >> The problem I am (personally) having is that I don't yet understand why >> or how those are connected: what is it about having locked non-NULL that >> means the process is killable? (Can you explain why that is?) > > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow > retry at all it means we'll never wait in handle_mm_fault() anyway, then no > need to worry on being interrupted by any kind of signal (fatal or not). > > Then if we allow retry, we need some way to know "whether mmap_sem is > released or not" during the process for the caller (because the caller > cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so > that we can set *locked=false to tell the caller we have released mmap_sem. > > I think that's why we have "locked" defined as "we allow this page fault > request to retry and wait, during wait we can always allow fatal signals". > I think that's defined throughout the gup call interfaces too, and > faultin_page() is the last step to talk to handle_mm_fault(). > > To make this whole picture complete, NOWAIT is another thing that relies on > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". > For example, when we want to make sure no vma will be released after > faultin_page() returned. > Again, thanks for taking the time to explain that for me. :) >> >> If that were clear, I think I could suggest a good comment wording. > > IMHO it's a little bit weird to explain "locked" here, especially after > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd > attempt. There are some comments for "locked" above the definition of > faultin_page(), I think that'll be a nicer place to enrich explanations for > "locked", and it seems even more suitable as a separate patch? > Totally agreed. I didn't intend to ask for that kind of documentation here. For that, I'm thinking a combination of cleaning up *locked a little bit, plus maybe some higher level notes like what you wrote above, added to either pin_user_pages.rst or a new get_user_pages.rst or some .rst anyway. Definitely a separately thing. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-30 1:53 ` John Hubbard @ 2022-06-30 13:49 ` Peter Xu 2022-06-30 19:01 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-30 13:49 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Wed, Jun 29, 2022 at 06:53:30PM -0700, John Hubbard wrote: > On 6/29/22 08:47, Peter Xu wrote: > > > It looks like part of this comment is trying to document a pre-existing > > > concept, which is that faultin_page() only ever sets FAULT_FLAG_KILLABLE > > > if locked != NULL. > > > > I'd say that's not what I wanted to comment.. I wanted to express that > > INTERRUPTIBLE should rely on KILLABLE, that's also why I put the comment to > > be after KILLABLE, not before. IMHO it makes sense already to have > > "interruptible" only if "killable", no matter what's the pre-requisite for > > KILLABLE (in this case it's having "locked" being non-null). > > > > OK, I think I finally understand both the intention of the comment, > and (thanks to your notes, below) the interaction between *locked and > _RETRY, _KILLABLE, and _INTERRUPTIBLE. Really appreciate your leading > me by the nose through that. The pre-existing code is abusing *locked > a bit, by treating it as a flag when really it is a side effect of > flags, but at least now that's clear to me. I agree, alternatively we could have some other FOLL_ flags to represent "locked != NULL" and do sanity check to make sure when the flag is there locked is always set correctly. Current code is a more "dense" way to do this, even though it could be slightly harder to follow. > > Anyway...this leads to finally getting into the comment, which I now > think is not quite what we want: there is no need for a hierarchy of > "_INTERRUPTIBLE should depend upon _KILLABLE". That is: even though an > application allows a fatal signal to get through, it's not clear to me > that that implies that non-fatal signal handling should be prevented. > > The code is only vaguely enforcing such a thing, because it just so > happens that both cases require the same basic prerequisites. So the > code looks good, but I don't see a need to claim a hierarchy in the > comments. > > So I'd either delete the comment entirely, or go with something that is > doesn't try to talk about hierarchy nor locked/retry either. Does this > look reasonable to you: > > > /* > * FAULT_FLAG_INTERRUPTIBLE is opt-in: kernel callers must set > * FOLL_INTERRUPTIBLE. That's because some callers may not be > * prepared to handle early exits caused by non-fatal signals. > */ > > ? Looks good to me, I'd tune a bit to make it less ambiguous on a few places: /* * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. * That's because some callers may not be prepared to * handle early exits caused by non-fatal signals. */ Would that be okay to you? > > > > The problem I am (personally) having is that I don't yet understand why > > > or how those are connected: what is it about having locked non-NULL that > > > means the process is killable? (Can you explain why that is?) > > > > Firstly RETRY_KILLABLE relies on ALLOW_RETRY, because if we don't allow > > retry at all it means we'll never wait in handle_mm_fault() anyway, then no > > need to worry on being interrupted by any kind of signal (fatal or not). > > > > Then if we allow retry, we need some way to know "whether mmap_sem is > > released or not" during the process for the caller (because the caller > > cannot see VM_FAULT_RETRY). That's why we added "locked" parameter, so > > that we can set *locked=false to tell the caller we have released mmap_sem. > > > > I think that's why we have "locked" defined as "we allow this page fault > > request to retry and wait, during wait we can always allow fatal signals". > > I think that's defined throughout the gup call interfaces too, and > > faultin_page() is the last step to talk to handle_mm_fault(). > > > > To make this whole picture complete, NOWAIT is another thing that relies on > > ALLOW_RETRY but just to tell "oh please never release the mmap_sem at all". > > For example, when we want to make sure no vma will be released after > > faultin_page() returned. > > > > Again, thanks for taking the time to explain that for me. :) My thanks for reviewing! > > > > > > > If that were clear, I think I could suggest a good comment wording. > > > > IMHO it's a little bit weird to explain "locked" here, especially after > > KILLABLE is set, that's why I didn't try to mention "locked" in my 2nd > > attempt. There are some comments for "locked" above the definition of > > faultin_page(), I think that'll be a nicer place to enrich explanations for > > "locked", and it seems even more suitable as a separate patch? > > > > Totally agreed. I didn't intend to ask for that kind of documentation > here. > > For that, I'm thinking a combination of cleaning up *locked a little > bit, plus maybe some higher level notes like what you wrote above, added > to either pin_user_pages.rst or a new get_user_pages.rst or some .rst > anyway. Definitely a separately thing. Sounds good. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-30 13:49 ` Peter Xu @ 2022-06-30 19:01 ` John Hubbard 2022-06-30 21:27 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-30 19:01 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/30/22 06:49, Peter Xu wrote: > Looks good to me, I'd tune a bit to make it less ambiguous on a few places: > > /* > * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set > * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. > * That's because some callers may not be prepared to > * handle early exits caused by non-fatal signals. > */ > > Would that be okay to you? > Yes, looks good. With that change, please feel free to add: Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-30 19:01 ` John Hubbard @ 2022-06-30 21:27 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-06-30 21:27 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Thu, Jun 30, 2022 at 12:01:53PM -0700, John Hubbard wrote: > On 6/30/22 06:49, Peter Xu wrote: > > Looks good to me, I'd tune a bit to make it less ambiguous on a few places: > > > > /* > > * FAULT_FLAG_INTERRUPTIBLE is opt-in. GUP callers must set > > * FOLL_INTERRUPTIBLE to enable FAULT_FLAG_INTERRUPTIBLE. > > * That's because some callers may not be prepared to > > * handle early exits caused by non-fatal signals. > > */ > > > > Would that be okay to you? > > > > Yes, looks good. With that change, please feel free to add: > > Reviewed-by: John Hubbard <jhubbard@nvidia.com> Will do, thanks! -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu 2022-06-25 0:35 ` Jason Gunthorpe 2022-06-28 2:07 ` John Hubbard @ 2022-07-04 22:48 ` Matthew Wilcox 2022-07-07 15:06 ` Peter Xu 2 siblings, 1 reply; 39+ messages in thread From: Matthew Wilcox @ 2022-07-04 22:48 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > +/* > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > + * specified, it'll also respond to generic signals. The caller of GUP > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > + */ > +static bool gup_signal_pending(unsigned int flags) > +{ > + if (fatal_signal_pending(current)) > + return true; > + > + if (!(flags & FOLL_INTERRUPTIBLE)) > + return false; > + > + return signal_pending(current); > +} This should resemble signal_pending_state() more closely, if indeed not be a wrapper of signal_pending_state(). ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE 2022-07-04 22:48 ` Matthew Wilcox @ 2022-07-07 15:06 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-07-07 15:06 UTC (permalink / raw) To: Matthew Wilcox Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Mon, Jul 04, 2022 at 11:48:17PM +0100, Matthew Wilcox wrote: > On Wed, Jun 22, 2022 at 05:36:53PM -0400, Peter Xu wrote: > > +/* > > + * GUP always responds to fatal signals. When FOLL_INTERRUPTIBLE is > > + * specified, it'll also respond to generic signals. The caller of GUP > > + * that has FOLL_INTERRUPTIBLE should take care of the GUP interruption. > > + */ > > +static bool gup_signal_pending(unsigned int flags) > > +{ > > + if (fatal_signal_pending(current)) > > + return true; > > + > > + if (!(flags & FOLL_INTERRUPTIBLE)) > > + return false; > > + > > + return signal_pending(current); > > +} > > This should resemble signal_pending_state() more closely, if indeed not > be a wrapper of signal_pending_state(). Could you be more specific? Note that the only thing that should affect the signal handling here is FOLL_INTERRUPTIBLE, we don't allow anything else being passed in, e.g. we don't take TASK_INTERRUPTIBLE or TASK_*. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu @ 2022-06-22 21:36 ` Peter Xu 2022-06-23 14:49 ` Sean Christopherson 2022-06-28 2:17 ` John Hubbard 2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu 2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu 3 siblings, 2 replies; 39+ messages in thread From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw) To: kvm, linux-kernel Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare for new boolean to be added to __gfn_to_pfn_memslot(). Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/arm64/kvm/mmu.c | 5 ++-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 ++-- arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 ++-- arch/x86/kvm/mmu/mmu.c | 10 +++---- include/linux/kvm_host.h | 9 ++++++- virt/kvm/kvm_main.c | 37 +++++++++++++++----------- virt/kvm/kvm_mm.h | 6 +++-- virt/kvm/pfncache.c | 2 +- 8 files changed, 49 insertions(+), 30 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index f5651a05b6a8..ce1edb512b4e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, - write_fault, &writable, NULL); + pfn = __gfn_to_pfn_memslot(memslot, gfn, + write_fault ? KVM_GTP_WRITE : 0, + NULL, &writable, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 514fd45c1994..e2769d58dd87 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, write_ok = true; } else { /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, - writing, &write_ok, NULL); + pfn = __gfn_to_pfn_memslot(memslot, gfn, + writing ? KVM_GTP_WRITE : 0, + NULL, &write_ok, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 42851c32ff3b..232b17c75b83 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long pfn; /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, - writing, upgrade_p, NULL); + pfn = __gfn_to_pfn_memslot(memslot, gfn, + writing ? KVM_GTP_WRITE : 0, + NULL, upgrade_p, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; page = NULL; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f4653688fa6d..e92f1ab63d6a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { + kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0; struct kvm_memory_slot *slot = fault->slot; bool async; @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, - fault->write, &fault->map_writable, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, + &async, &fault->map_writable, &fault->hva); if (!async) return RET_PF_CONTINUE; /* *pfn has correct page already */ @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, - fault->write, &fault->map_writable, - &fault->hva); + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, + &fault->map_writable, &fault->hva); return RET_PF_CONTINUE; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c20f2d55840c..b646b6fcaec6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); + +/* gfn_to_pfn (gtp) flags */ +typedef unsigned int __bitwise kvm_gtp_flag_t; + +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) + kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + kvm_gtp_flag_t gtp_flags, bool *async, bool *writable, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 64ec2222a196..952400b42ee9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2444,9 +2444,11 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, +static int hva_to_pfn_slow(unsigned long addr, bool *async, + kvm_gtp_flag_t gtp_flags, bool *writable, kvm_pfn_t *pfn) { + bool write_fault = gtp_flags & KVM_GTP_WRITE; unsigned int flags = FOLL_HWPOISON; struct page *page; int npages = 0; @@ -2565,20 +2567,22 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, /* * Pin guest page in memory and return its pfn. * @addr: host virtual address which maps memory to the guest - * @atomic: whether this function can sleep + * @gtp_flags: kvm_gtp_flag_t flags (atomic, write, ..) * @async: whether this function need to wait IO complete if the * host page is not in the memory - * @write_fault: whether we should get a writable host page * @writable: whether it allows to map a writable host page for !@write_fault * - * The function will map a writable host page for these two cases: + * The function will map a writable (KVM_GTP_WRITE set) host page for these + * two cases: * 1): @write_fault = true * 2): @write_fault = false && @writable, @writable will tell the caller * whether the mapping is writable. */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, - bool write_fault, bool *writable) +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, + bool *writable) { + bool write_fault = gtp_flags & KVM_GTP_WRITE; + bool atomic = gtp_flags & KVM_GTP_ATOMIC; struct vm_area_struct *vma; kvm_pfn_t pfn = 0; int npages, r; @@ -2592,7 +2596,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); + npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn); if (npages == 1) return pfn; @@ -2625,10 +2629,11 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, } kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + kvm_gtp_flag_t gtp_flags, bool *async, bool *writable, hva_t *hva) { - unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); + unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, + gtp_flags & KVM_GTP_WRITE); if (hva) *hva = addr; @@ -2651,28 +2656,30 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } - return hva_to_pfn(addr, atomic, async, write_fault, - writable); + return hva_to_pfn(addr, gtp_flags, async, writable); } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, - write_fault, writable, NULL); + return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, + write_fault ? KVM_GTP_WRITE : 0, + NULL, writable, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE, + NULL, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE | KVM_GTP_ATOMIC, + NULL, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 41da467d99c9..1c870911eb48 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -3,6 +3,8 @@ #ifndef __KVM_MM_H__ #define __KVM_MM_H__ 1 +#include <linux/kvm_host.h> + /* * Architectures can choose whether to use an rwlock or spinlock * for the mmu_lock. These macros, for use in common code @@ -24,8 +26,8 @@ #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, - bool write_fault, bool *writable); +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, + bool *writable); #ifdef CONFIG_HAVE_KVM_PFNCACHE void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index dd84676615f1..0f9f6b5d2fbb 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) smp_rmb(); /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); + new_pfn = hva_to_pfn(uhva, KVM_GTP_WRITE, NULL, NULL); if (is_error_noslot_pfn(new_pfn)) break; -- 2.32.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu @ 2022-06-23 14:49 ` Sean Christopherson 2022-06-23 19:46 ` Peter Xu 2022-06-28 2:17 ` John Hubbard 1 sibling, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 14:49 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Wed, Jun 22, 2022, Peter Xu wrote: > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > for new boolean to be added to __gfn_to_pfn_memslot(). ... > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > > async = false; > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, > - fault->write, &fault->map_writable, > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, > + &async, &fault->map_writable, > &fault->hva); > if (!async) > return RET_PF_CONTINUE; /* *pfn has correct page already */ > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > } > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, > - fault->write, &fault->map_writable, > - &fault->hva); > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, > + &fault->map_writable, &fault->hva); > return RET_PF_CONTINUE; > } > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c20f2d55840c..b646b6fcaec6 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, > bool *writable); > kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); > kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); > + > +/* gfn_to_pfn (gtp) flags */ > +typedef unsigned int __bitwise kvm_gtp_flag_t; > + > +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > + > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > - bool atomic, bool *async, bool write_fault, > + kvm_gtp_flag_t gtp_flags, bool *async, > bool *writable, hva_t *hva); I completely agree the list of booleans is a mess, but I don't love the result of adding @flags. I wonder if we can do something similar to x86's struct kvm_page_fault and add an internal struct to pass params. And then add e.g. gfn_to_pfn_interruptible() to wrap that logic. I suspect we could also clean up the @async behavior at the same time, as its interaction with FOLL_NOWAIT is confusing. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-23 14:49 ` Sean Christopherson @ 2022-06-23 19:46 ` Peter Xu 2022-06-23 20:29 ` Sean Christopherson 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-23 19:46 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022 at 02:49:47PM +0000, Sean Christopherson wrote: > On Wed, Jun 22, 2022, Peter Xu wrote: > > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > > for new boolean to be added to __gfn_to_pfn_memslot(). > > ... > > > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > } > > > > async = false; > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, > > - fault->write, &fault->map_writable, > > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, > > + &async, &fault->map_writable, > > &fault->hva); > > if (!async) > > return RET_PF_CONTINUE; /* *pfn has correct page already */ > > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > } > > } > > > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, > > - fault->write, &fault->map_writable, > > - &fault->hva); > > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, > > + &fault->map_writable, &fault->hva); > > return RET_PF_CONTINUE; > > } > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index c20f2d55840c..b646b6fcaec6 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, > > bool *writable); > > kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); > > kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); > > + > > +/* gfn_to_pfn (gtp) flags */ > > +typedef unsigned int __bitwise kvm_gtp_flag_t; > > + > > +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > > +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > > + > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > > - bool atomic, bool *async, bool write_fault, > > + kvm_gtp_flag_t gtp_flags, bool *async, > > bool *writable, hva_t *hva); > > I completely agree the list of booleans is a mess, but I don't love the result of > adding @flags. I wonder if we can do something similar to x86's struct kvm_page_fault > and add an internal struct to pass params. Yep we can. It's just that it'll be another goal irrelevant of this series but it could be a standalone cleanup patchset for gfn->hpa conversion paths. Say, the new struct can also be done on top containing the new flag, IMHO. This reminded me of an interesting topic that Nadav used to mention that when Matthew changed some of the Linux function parameters into a structure then the .obj actually grows a bit due to the strong stack protector that Linux uses. If I'll be doing such a change I'd guess I need to dig a bit into that first, but hopefully I don't need to for this series alone. Sorry to be off-topic: I think it's a matter of whether you think it's okay we merge the flags first, even if we want to go with a struct pointer finally. > And then add e.g. gfn_to_pfn_interruptible() to wrap that logic. That helper sounds good, it's just that the major user I'm modifying here doesn't really use gfn_to_pfn() at all but __gfn_to_pfn_memslot() underneath. I'll remember to have that when I plan to convert some gfn_to_pfn() call sites. > > I suspect we could also clean up the @async behavior at the same time, as its > interaction with FOLL_NOWAIT is confusing. Yeah I don't like that either. Let me think about that when proposing a new version. Logically that's separate idea from this series too, but if you think that'll be nice to have altogether then I can give it a shot. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-23 19:46 ` Peter Xu @ 2022-06-23 20:29 ` Sean Christopherson 2022-06-23 21:29 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 20:29 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022, Peter Xu wrote: > On Thu, Jun 23, 2022 at 02:49:47PM +0000, Sean Christopherson wrote: > > On Wed, Jun 22, 2022, Peter Xu wrote: > > > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > > > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > > > for new boolean to be added to __gfn_to_pfn_memslot(). ... > > > +/* gfn_to_pfn (gtp) flags */ > > > +typedef unsigned int __bitwise kvm_gtp_flag_t; > > > + > > > +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > > > +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > > > + > > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > > > - bool atomic, bool *async, bool write_fault, > > > + kvm_gtp_flag_t gtp_flags, bool *async, > > > bool *writable, hva_t *hva); > > > > I completely agree the list of booleans is a mess, but I don't love the result of > > adding @flags. I wonder if we can do something similar to x86's struct kvm_page_fault > > and add an internal struct to pass params. > > Yep we can. It's just that it'll be another goal irrelevant of this series But it's not irrelevant. By introducing KVM_GTP_*, you opened the topic of cleaning up the parameters. Don't get me wrong, I like that you proposed cleaning up the mess, but if we're going to churn then let's get the API right. > but it could be a standalone cleanup patchset for gfn->hpa conversion > paths. Say, the new struct can also be done on top containing the new > flag, IMHO. No, because if we go to a struct, then I'd much rather have bools and not flags. > This reminded me of an interesting topic that Nadav used to mention that > when Matthew changed some of the Linux function parameters into a structure > then the .obj actually grows a bit due to the strong stack protector that > Linux uses. If I'll be doing such a change I'd guess I need to dig a bit > into that first, but hopefully I don't need to for this series alone. > > Sorry to be off-topic: I think it's a matter of whether you think it's okay > we merge the flags first, even if we want to go with a struct pointer > finally. Either take a dependency on doing a full cleanup, or just add yet another bool and leave _all_ cleanup to a separate series. Resolving conflicts with a new param is fairly straightforward, whereas resolving divergent cleanups gets painful. As gross as it is, I think my preference would be to just add another bool in this series. Then we can get more aggressive with a cleanup without having to worry about unnecessarily pushing this series out a release or three. > > And then add e.g. gfn_to_pfn_interruptible() to wrap that logic. > > That helper sounds good, it's just that the major user I'm modifying here > doesn't really use gfn_to_pfn() at all but __gfn_to_pfn_memslot() > underneath. I'll remember to have that when I plan to convert some > gfn_to_pfn() call sites. Ah, right. That can be remedied more easily if @async goes away. Then we can have: gfn_to_pfn_memslot_nowait() and gfn_to_pfn_memslot_interruptible() and those are mutually exclusive, i.e. recognize generic signals if and only if gup is allowed to wait. But that can be left to the cleanup series. > > I suspect we could also clean up the @async behavior at the same time, as its > > interaction with FOLL_NOWAIT is confusing. > > Yeah I don't like that either. Let me think about that when proposing a > new version. Logically that's separate idea from this series too, but if > you think that'll be nice to have altogether then I can give it a shot. This is what I came up with for splitting @async into a pure input (no_wait) and a return value (KVM_PFN_ERR_NEEDS_IO). --- arch/arm64/kvm/mmu.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 8 ++-- include/linux/kvm_host.h | 3 +- virt/kvm/kvm_main.c | 57 ++++++++++++++------------ virt/kvm/kvm_mm.h | 2 +- virt/kvm/pfncache.c | 2 +- 8 files changed, 41 insertions(+), 39 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 87f1cd0df36e..a87f01edef8e 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1204,7 +1204,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ smp_rmb(); - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, write_fault, &writable, NULL); if (pfn == KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 514fd45c1994..32f4b56ca315 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -590,7 +590,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, /* * Do a fast check first, since __gfn_to_pfn_memslot doesn't - * do it with !atomic && !async, which is how we call it. + * do it with !atomic && !nowait, which is how we call it. * We always ask for write permission since the common case * is that the page is writable. */ @@ -598,7 +598,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, write_ok = true; } else { /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, writing, &write_ok, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 42851c32ff3b..4338affe295e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -845,7 +845,7 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long pfn; /* Call KVM generic code to do the slow-path check */ - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, writing, upgrade_p, NULL); if (is_error_noslot_pfn(pfn)) return -EFAULT; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 79c6a821ea0d..35b364589fa4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4102,7 +4102,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; - bool async; /* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -4131,11 +4130,10 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return RET_PF_EMULATE; } - async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, fault->write, &fault->map_writable, &fault->hva); - if (!async) + if (fault->pfn != KVM_PFN_ERR_NEEDS_IO) return RET_PF_CONTINUE; /* *pfn has correct page already */ if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { @@ -4149,7 +4147,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, fault->write, &fault->map_writable, &fault->hva); return RET_PF_CONTINUE; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3554e48406e4..ecd5f686d33a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -96,6 +96,7 @@ #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) +#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 3) /* * error pfns indicate that the gfn is in slot but faild to @@ -1146,7 +1147,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + bool atomic, bool no_wait, bool write_fault, bool *writable, hva_t *hva); void kvm_release_pfn_clean(kvm_pfn_t pfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 45188d11812c..6b63aa5fa5ed 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2497,7 +2497,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, +static int hva_to_pfn_slow(unsigned long addr, bool no_wait, bool write_fault, bool *writable, kvm_pfn_t *pfn) { unsigned int flags = FOLL_HWPOISON; @@ -2511,7 +2511,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, if (write_fault) flags |= FOLL_WRITE; - if (async) + if (no_wait) flags |= FOLL_NOWAIT; npages = get_user_pages_unlocked(addr, 1, &page, flags); @@ -2619,28 +2619,31 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, } /* - * Pin guest page in memory and return its pfn. + * Get the host pfn for a given host virtual address. If a pfn is found and is + * backed by a refcounted struct page, the caller is responsible for putting + * the reference, i.e. this returns with an elevated refcount. + * * @addr: host virtual address which maps memory to the guest - * @atomic: whether this function can sleep - * @async: whether this function need to wait IO complete if the - * host page is not in the memory - * @write_fault: whether we should get a writable host page - * @writable: whether it allows to map a writable host page for !@write_fault - * - * The function will map a writable host page for these two cases: - * 1): @write_fault = true - * 2): @write_fault = false && @writable, @writable will tell the caller - * whether the mapping is writable. + * @atomic: if true, do not sleep (effectively means "fast gup only") + * @no_wait: if true, do not wait for IO to complete if the host page is not in + * memory, e.g. is swapped out or not yet transfered during post-copy + * @write_fault: if true, a writable mapping is _required_ + * @writable: if non-NULL, a writable mapping is _allowed_, but not required; + * set to %true (if non-NULL) and a writable host page was retrieved */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait, bool write_fault, bool *writable) { struct vm_area_struct *vma; kvm_pfn_t pfn; int npages, r; - /* we can do it either atomically or asynchronously, not both */ - BUG_ON(atomic && async); + /* + * Waiting requires sleeping, and so is mutually exclusive with atomic + * lookups which are not allowed to sleep. + */ + if (WARN_ON_ONCE(atomic && !no_wait)) + return KVM_PFN_ERR_FAULT; if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) return pfn; @@ -2648,13 +2651,13 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); + npages = hva_to_pfn_slow(addr, no_wait, write_fault, writable, &pfn); if (npages == 1) return pfn; mmap_read_lock(current->mm); if (npages == -EHWPOISON || - (!async && check_user_page_hwpoison(addr))) { + (!no_wait && check_user_page_hwpoison(addr))) { pfn = KVM_PFN_ERR_HWPOISON; goto exit; } @@ -2671,9 +2674,10 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, if (r < 0) pfn = KVM_PFN_ERR_FAULT; } else { - if (async && vma_is_valid(vma, write_fault)) - *async = true; - pfn = KVM_PFN_ERR_FAULT; + if (no_wait && vma_is_valid(vma, write_fault)) + pfn = KVM_PFN_ERR_NEEDS_IO; + else + pfn = KVM_PFN_ERR_FAULT; } exit: mmap_read_unlock(current->mm); @@ -2681,7 +2685,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, } kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool *async, bool write_fault, + bool atomic, bool no_wait, bool write_fault, bool *writable, hva_t *hva) { unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); @@ -2707,28 +2711,27 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, writable = NULL; } - return hva_to_pfn(addr, atomic, async, write_fault, - writable); + return hva_to_pfn(addr, atomic, no_wait, write_fault, writable); } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, + return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, write_fault, writable, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, false, false, true, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL); + return __gfn_to_pfn_memslot(slot, gfn, true, false, true, NULL, NULL); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 41da467d99c9..40e87b4b4629 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -24,7 +24,7 @@ #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, +kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool no_wait, bool write_fault, bool *writable); #ifdef CONFIG_HAVE_KVM_PFNCACHE diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index ab519f72f2cd..6a05d6d0fbe9 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -181,7 +181,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) } /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL); + new_pfn = hva_to_pfn(gpc->uhva, false, false, true, NULL); if (is_error_noslot_pfn(new_pfn)) goto out_error; base-commit: 4284f0063c48fc3734b0bedb023702c4d606732f -- ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-23 20:29 ` Sean Christopherson @ 2022-06-23 21:29 ` Peter Xu 2022-06-23 21:52 ` Sean Christopherson 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-23 21:29 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote: > This is what I came up with for splitting @async into a pure input (no_wait) and > a return value (KVM_PFN_ERR_NEEDS_IO). The attached patch looks good to me. It's just that.. [...] > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > - bool atomic, bool *async, bool write_fault, > + bool atomic, bool no_wait, bool write_fault, > bool *writable, hva_t *hva) .. with this patch on top we'll have 3 booleans already. With the new one to add separated as suggested then it'll hit 4. Let's say one day we'll have that struct, but.. are you sure you think keeping four booleans around is nicer than having a flag, no matter whether we'd like to have a struct or not? kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, bool atomic, bool no_wait, bool write_fault, bool interruptible, bool *writable, hva_t *hva); What if the booleans goes to 5, 6, or more? /me starts to wonder what'll be the magic number that we'll start to think a bitmask flag will be more lovely here. :) -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-23 21:29 ` Peter Xu @ 2022-06-23 21:52 ` Sean Christopherson 2022-06-27 19:12 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 21:52 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022, Peter Xu wrote: > On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote: > > This is what I came up with for splitting @async into a pure input (no_wait) and > > a return value (KVM_PFN_ERR_NEEDS_IO). > > The attached patch looks good to me. It's just that.. > > [...] > > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > > - bool atomic, bool *async, bool write_fault, > > + bool atomic, bool no_wait, bool write_fault, > > bool *writable, hva_t *hva) > > .. with this patch on top we'll have 3 booleans already. With the new one > to add separated as suggested then it'll hit 4. > > Let's say one day we'll have that struct, but.. are you sure you think > keeping four booleans around is nicer than having a flag, no matter whether > we'd like to have a struct or not? No. > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > bool atomic, bool no_wait, bool write_fault, > bool interruptible, bool *writable, hva_t *hva); > > What if the booleans goes to 5, 6, or more? > > /me starts to wonder what'll be the magic number that we'll start to think > a bitmask flag will be more lovely here. :) For the number to really matter, it'd have to be comically large, e.g. 100+. This is all on-stack memory, so it's as close to free as can we can get. Overhead in terms of (un)marshalling is likely a wash for flags versus bools. Bools pack in nicely, so until there are a _lot_ of bools, memory is a non-issue. That leaves readability, which isn't dependent on the number so much as it is on the usage, and will be highly subjective based on the final code. In other words, I'm not dead set against flags, but I would like to see a complete cleanup before making a decision. My gut reaction is to use bools, as it makes consumption cleaner in most cases, e.g. if (!(xxx->write_fault || writable)) return false; versus if (!((xxx->flags & KVM_GTP_WRITE) || writable)) return false; but again I'm not going to say never until I actually see the end result. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-23 21:52 ` Sean Christopherson @ 2022-06-27 19:12 ` John Hubbard 0 siblings, 0 replies; 39+ messages in thread From: John Hubbard @ 2022-06-27 19:12 UTC (permalink / raw) To: Sean Christopherson, Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On 6/23/22 14:52, Sean Christopherson wrote: > On Thu, Jun 23, 2022, Peter Xu wrote: >> On Thu, Jun 23, 2022 at 08:29:13PM +0000, Sean Christopherson wrote: >>> This is what I came up with for splitting @async into a pure input (no_wait) and >>> a return value (KVM_PFN_ERR_NEEDS_IO). >> >> The attached patch looks good to me. It's just that.. >> >> [...] >> >>> kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, >>> - bool atomic, bool *async, bool write_fault, >>> + bool atomic, bool no_wait, bool write_fault, >>> bool *writable, hva_t *hva) >> >> .. with this patch on top we'll have 3 booleans already. With the new one >> to add separated as suggested then it'll hit 4. >> >> Let's say one day we'll have that struct, but.. are you sure you think >> keeping four booleans around is nicer than having a flag, no matter whether >> we'd like to have a struct or not? > > No. > >> kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, >> bool atomic, bool no_wait, bool write_fault, >> bool interruptible, bool *writable, hva_t *hva); >> >> What if the booleans goes to 5, 6, or more? >> >> /me starts to wonder what'll be the magic number that we'll start to think >> a bitmask flag will be more lovely here. :) > > For the number to really matter, it'd have to be comically large, e.g. 100+. This > is all on-stack memory, so it's as close to free as can we can get. Overhead in > terms of (un)marshalling is likely a wash for flags versus bools. Bools pack in > nicely, so until there are a _lot_ of bools, memory is a non-issue. It's pretty unusual to see that claim, in kernel mm code. :) Flags are often used, because they take less space than booleans, and C bitfields have other problems. > > That leaves readability, which isn't dependent on the number so much as it is on > the usage, and will be highly subjective based on the final code. > > In other words, I'm not dead set against flags, but I would like to see a complete > cleanup before making a decision. My gut reaction is to use bools, as it makes > consumption cleaner in most cases, e.g. > > if (!(xxx->write_fault || writable)) > return false; > > versus > > if (!((xxx->flags & KVM_GTP_WRITE) || writable)) > return false; > > but again I'm not going to say never until I actually see the end result. > Just to add a light counter-argument: the readability is similar enough that I think the compactness in memory makes flags a little better. imho anyway. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu 2022-06-23 14:49 ` Sean Christopherson @ 2022-06-28 2:17 ` John Hubbard 2022-06-28 19:46 ` Peter Xu 1 sibling, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-28 2:17 UTC (permalink / raw) To: Peter Xu, kvm, linux-kernel Cc: Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/22/22 14:36, Peter Xu wrote: > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > for new boolean to be added to __gfn_to_pfn_memslot(). > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > arch/arm64/kvm/mmu.c | 5 ++-- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 ++-- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 ++-- > arch/x86/kvm/mmu/mmu.c | 10 +++---- > include/linux/kvm_host.h | 9 ++++++- > virt/kvm/kvm_main.c | 37 +++++++++++++++----------- > virt/kvm/kvm_mm.h | 6 +++-- > virt/kvm/pfncache.c | 2 +- > 8 files changed, 49 insertions(+), 30 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index f5651a05b6a8..ce1edb512b4e 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > */ > smp_rmb(); > > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > - write_fault, &writable, NULL); > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > + write_fault ? KVM_GTP_WRITE : 0, > + NULL, &writable, NULL); > if (pfn == KVM_PFN_ERR_HWPOISON) { > kvm_send_hwpoison_signal(hva, vma_shift); > return 0; > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 514fd45c1994..e2769d58dd87 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, > write_ok = true; > } else { > /* Call KVM generic code to do the slow-path check */ > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > - writing, &write_ok, NULL); > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > + writing ? KVM_GTP_WRITE : 0, > + NULL, &write_ok, NULL); > if (is_error_noslot_pfn(pfn)) > return -EFAULT; > page = NULL; > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 42851c32ff3b..232b17c75b83 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, > unsigned long pfn; > > /* Call KVM generic code to do the slow-path check */ > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > - writing, upgrade_p, NULL); > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > + writing ? KVM_GTP_WRITE : 0, > + NULL, upgrade_p, NULL); > if (is_error_noslot_pfn(pfn)) > return -EFAULT; > page = NULL; > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index f4653688fa6d..e92f1ab63d6a 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) > > static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > + kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0; > struct kvm_memory_slot *slot = fault->slot; > bool async; > > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > > async = false; > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, > - fault->write, &fault->map_writable, > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, > + &async, &fault->map_writable, > &fault->hva); > if (!async) > return RET_PF_CONTINUE; /* *pfn has correct page already */ > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > } > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, > - fault->write, &fault->map_writable, > - &fault->hva); > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, > + &fault->map_writable, &fault->hva); The flags arg does improve the situation, yes. > return RET_PF_CONTINUE; > } > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index c20f2d55840c..b646b6fcaec6 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, > bool *writable); > kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); > kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); > + > +/* gfn_to_pfn (gtp) flags */ > +typedef unsigned int __bitwise kvm_gtp_flag_t; A minor naming problem: GTP and especially gtp_flags is way too close to gfp_flags. It will make people either wonder if it's a typo, or worse, *assume* that it's a typo. :) Yes, "read the code", but if you can come up with a better TLA than GTP here, let's consider using it. Overall, the change looks like an improvement, even though write_fault ? KVM_GTP_WRITE : 0 is not wonderful. But improving *that* leads to a a big pile of diffs that are rather beyond the scope here. thanks, -- John Hubbard NVIDIA > + > +#define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > +#define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > + > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > - bool atomic, bool *async, bool write_fault, > + kvm_gtp_flag_t gtp_flags, bool *async, > bool *writable, hva_t *hva); > > void kvm_release_pfn_clean(kvm_pfn_t pfn); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 64ec2222a196..952400b42ee9 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2444,9 +2444,11 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, > * The slow path to get the pfn of the specified host virtual address, > * 1 indicates success, -errno is returned if error is detected. > */ > -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, > +static int hva_to_pfn_slow(unsigned long addr, bool *async, > + kvm_gtp_flag_t gtp_flags, > bool *writable, kvm_pfn_t *pfn) > { > + bool write_fault = gtp_flags & KVM_GTP_WRITE; > unsigned int flags = FOLL_HWPOISON; > struct page *page; > int npages = 0; > @@ -2565,20 +2567,22 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, > /* > * Pin guest page in memory and return its pfn. > * @addr: host virtual address which maps memory to the guest > - * @atomic: whether this function can sleep > + * @gtp_flags: kvm_gtp_flag_t flags (atomic, write, ..) > * @async: whether this function need to wait IO complete if the > * host page is not in the memory > - * @write_fault: whether we should get a writable host page > * @writable: whether it allows to map a writable host page for !@write_fault > * > - * The function will map a writable host page for these two cases: > + * The function will map a writable (KVM_GTP_WRITE set) host page for these > + * two cases: > * 1): @write_fault = true > * 2): @write_fault = false && @writable, @writable will tell the caller > * whether the mapping is writable. > */ > -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, > - bool write_fault, bool *writable) > +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, > + bool *writable) > { > + bool write_fault = gtp_flags & KVM_GTP_WRITE; > + bool atomic = gtp_flags & KVM_GTP_ATOMIC; > struct vm_area_struct *vma; > kvm_pfn_t pfn = 0; > int npages, r; > @@ -2592,7 +2596,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, > if (atomic) > return KVM_PFN_ERR_FAULT; > > - npages = hva_to_pfn_slow(addr, async, write_fault, writable, &pfn); > + npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn); > if (npages == 1) > return pfn; > > @@ -2625,10 +2629,11 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, > } > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > - bool atomic, bool *async, bool write_fault, > + kvm_gtp_flag_t gtp_flags, bool *async, > bool *writable, hva_t *hva) > { > - unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); > + unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, > + gtp_flags & KVM_GTP_WRITE); > > if (hva) > *hva = addr; > @@ -2651,28 +2656,30 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > writable = NULL; > } > > - return hva_to_pfn(addr, atomic, async, write_fault, > - writable); > + return hva_to_pfn(addr, gtp_flags, async, writable); > } > EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); > > kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, > bool *writable) > { > - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, NULL, > - write_fault, writable, NULL); > + return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, > + write_fault ? KVM_GTP_WRITE : 0, > + NULL, writable, NULL); > } > EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); > > kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) > { > - return __gfn_to_pfn_memslot(slot, gfn, false, NULL, true, NULL, NULL); > + return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE, > + NULL, NULL, NULL); > } > EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); > > kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) > { > - return __gfn_to_pfn_memslot(slot, gfn, true, NULL, true, NULL, NULL); > + return __gfn_to_pfn_memslot(slot, gfn, KVM_GTP_WRITE | KVM_GTP_ATOMIC, > + NULL, NULL, NULL); > } > EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); > > diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h > index 41da467d99c9..1c870911eb48 100644 > --- a/virt/kvm/kvm_mm.h > +++ b/virt/kvm/kvm_mm.h > @@ -3,6 +3,8 @@ > #ifndef __KVM_MM_H__ > #define __KVM_MM_H__ 1 > > +#include <linux/kvm_host.h> > + > /* > * Architectures can choose whether to use an rwlock or spinlock > * for the mmu_lock. These macros, for use in common code > @@ -24,8 +26,8 @@ > #define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) > #endif /* KVM_HAVE_MMU_RWLOCK */ > > -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async, > - bool write_fault, bool *writable); > +kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, > + bool *writable); > > #ifdef CONFIG_HAVE_KVM_PFNCACHE > void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, > diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c > index dd84676615f1..0f9f6b5d2fbb 100644 > --- a/virt/kvm/pfncache.c > +++ b/virt/kvm/pfncache.c > @@ -123,7 +123,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) > smp_rmb(); > > /* We always request a writeable mapping */ > - new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); > + new_pfn = hva_to_pfn(uhva, KVM_GTP_WRITE, NULL, NULL); > if (is_error_noslot_pfn(new_pfn)) > break; > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-28 2:17 ` John Hubbard @ 2022-06-28 19:46 ` Peter Xu 2022-06-28 21:52 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-28 19:46 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Mon, Jun 27, 2022 at 07:17:09PM -0700, John Hubbard wrote: > On 6/22/22 14:36, Peter Xu wrote: > > Merge two boolean parameters into a bitmask flag called kvm_gtp_flag_t for > > __gfn_to_pfn_memslot(). This cleans the parameter lists, and also prepare > > for new boolean to be added to __gfn_to_pfn_memslot(). > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > arch/arm64/kvm/mmu.c | 5 ++-- > > arch/powerpc/kvm/book3s_64_mmu_hv.c | 5 ++-- > > arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 ++-- > > arch/x86/kvm/mmu/mmu.c | 10 +++---- > > include/linux/kvm_host.h | 9 ++++++- > > virt/kvm/kvm_main.c | 37 +++++++++++++++----------- > > virt/kvm/kvm_mm.h | 6 +++-- > > virt/kvm/pfncache.c | 2 +- > > 8 files changed, 49 insertions(+), 30 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index f5651a05b6a8..ce1edb512b4e 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1204,8 +1204,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > */ > > smp_rmb(); > > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > > - write_fault, &writable, NULL); > > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > > + write_fault ? KVM_GTP_WRITE : 0, > > + NULL, &writable, NULL); > > if (pfn == KVM_PFN_ERR_HWPOISON) { > > kvm_send_hwpoison_signal(hva, vma_shift); > > return 0; > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c > > index 514fd45c1994..e2769d58dd87 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > > @@ -598,8 +598,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, > > write_ok = true; > > } else { > > /* Call KVM generic code to do the slow-path check */ > > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > > - writing, &write_ok, NULL); > > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > > + writing ? KVM_GTP_WRITE : 0, > > + NULL, &write_ok, NULL); > > if (is_error_noslot_pfn(pfn)) > > return -EFAULT; > > page = NULL; > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > index 42851c32ff3b..232b17c75b83 100644 > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > > @@ -845,8 +845,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, > > unsigned long pfn; > > /* Call KVM generic code to do the slow-path check */ > > - pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, > > - writing, upgrade_p, NULL); > > + pfn = __gfn_to_pfn_memslot(memslot, gfn, > > + writing ? KVM_GTP_WRITE : 0, > > + NULL, upgrade_p, NULL); > > if (is_error_noslot_pfn(pfn)) > > return -EFAULT; > > page = NULL; > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index f4653688fa6d..e92f1ab63d6a 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3968,6 +3968,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) > > static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > { > > + kvm_gtp_flag_t flags = fault->write ? KVM_GTP_WRITE : 0; > > struct kvm_memory_slot *slot = fault->slot; > > bool async; > > @@ -3999,8 +4000,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > } > > async = false; > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, > > - fault->write, &fault->map_writable, > > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, > > + &async, &fault->map_writable, > > &fault->hva); > > if (!async) > > return RET_PF_CONTINUE; /* *pfn has correct page already */ > > @@ -4016,9 +4017,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > } > > } > > - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, > > - fault->write, &fault->map_writable, > > - &fault->hva); > > + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, > > + &fault->map_writable, &fault->hva); > > The flags arg does improve the situation, yes. Thanks for supporting a flag's existance. :) I'd say ultimately it could be a personal preference thing when the struct comes. > > > return RET_PF_CONTINUE; > > } > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index c20f2d55840c..b646b6fcaec6 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1146,8 +1146,15 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, > > bool *writable); > > kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn); > > kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn); > > + > > +/* gfn_to_pfn (gtp) flags */ > > +typedef unsigned int __bitwise kvm_gtp_flag_t; > > A minor naming problem: GTP and especially gtp_flags is way too close > to gfp_flags. It will make people either wonder if it's a typo, or > worse, *assume* that it's a typo. :) I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a bit close :) > > Yes, "read the code", but if you can come up with a better TLA than GTP > here, let's consider using it. Could I ask what's TLA? Any suggestions on the abbrev, btw? > > Overall, the change looks like an improvement, even though > > write_fault ? KVM_GTP_WRITE : 0 > > is not wonderful. But improving *that* leads to a a big pile of diffs > that are rather beyond the scope here. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-28 19:46 ` Peter Xu @ 2022-06-28 21:52 ` John Hubbard 2022-06-28 22:50 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-28 21:52 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/28/22 12:46, Peter Xu wrote: > I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a > bit close :) > >> >> Yes, "read the code", but if you can come up with a better TLA than GTP >> here, let's consider using it. > > Could I ask what's TLA? Any suggestions on the abbrev, btw? "Three-Letter Acronym". I love "TLA" because the very fact that one has to ask what it means, shows why using them makes it harder to communicate. :) As for alternatives, here I'll demonstrate that "GTP" actually is probably better than anything else I can come up with, heh. Brainstorming: * GPN ("Guest pfn to pfn") * GTPN (similar, but with a "T" for "to") * GFNC ("guest frame number conversion") ugggh. :) thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-28 21:52 ` John Hubbard @ 2022-06-28 22:50 ` Peter Xu 2022-06-28 22:55 ` John Hubbard 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-28 22:50 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Tue, Jun 28, 2022 at 02:52:04PM -0700, John Hubbard wrote: > On 6/28/22 12:46, Peter Xu wrote: > > I'd try to argu with "I prefixed it with kvm_", but oh well.. yes they're a > > bit close :) > > > > > > > > Yes, "read the code", but if you can come up with a better TLA than GTP > > > here, let's consider using it. > > > > Could I ask what's TLA? Any suggestions on the abbrev, btw? > > "Three-Letter Acronym". I love "TLA" because the very fact that one has > to ask what it means, shows why using them makes it harder to communicate. :) Ha! > > As for alternatives, here I'll demonstrate that "GTP" actually is probably > better than anything else I can come up with, heh. Brainstorming: > > * GPN ("Guest pfn to pfn") > * GTPN (similar, but with a "T" for "to") > * GFNC ("guest frame number conversion") Always a challenge on the naming kongfu. :-D One good thing on using TLA in macros, flags and codes (rather than simply mention some three letters): we can easily jump to the label of any of the flags when we want to figure out what it means, and logically there'll (and should) be explanations of the abbrev in the headers if it's a good header. Example: it's even not easy to figure out what GFP is in GFP_KERNEL flag for someone not familiar with mm (when I wrote this line, I got lost!), but when that happens we do jump label and at the entry of gfp.h we'll see: * ... The GFP acronym stands for get_free_pages(), And if you see the patch I actually did something similar (in kvm_host.h): /* gfn_to_pfn (gtp) flags */ ... I'd still go for GTP, but let me know if you think any of the above is better, I can switch. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-28 22:50 ` Peter Xu @ 2022-06-28 22:55 ` John Hubbard 2022-06-28 23:02 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2022-06-28 22:55 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On 6/28/22 15:50, Peter Xu wrote: > And if you see the patch I actually did something similar (in kvm_host.h): > > /* gfn_to_pfn (gtp) flags */ > ... > > I'd still go for GTP, but let me know if you think any of the above is > better, I can switch. > Yeah, I'll leave that call up to you. I don't have anything better. :) thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() 2022-06-28 22:55 ` John Hubbard @ 2022-06-28 23:02 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-06-28 23:02 UTC (permalink / raw) To: John Hubbard Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson On Tue, Jun 28, 2022 at 03:55:22PM -0700, John Hubbard wrote: > On 6/28/22 15:50, Peter Xu wrote: > > And if you see the patch I actually did something similar (in kvm_host.h): > > > > /* gfn_to_pfn (gtp) flags */ > > ... > > > > I'd still go for GTP, but let me know if you think any of the above is > > better, I can switch. > > > > Yeah, I'll leave that call up to you. I don't have anything better. :) Thanks. I'll also give it a shot with Sean's suggestion on struct when repost (I doubt whether I can bare with 4 bools after all..). If that goes well we don't worry this too since there'll be no flag introduced. -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR 2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu 2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu @ 2022-06-22 21:36 ` Peter Xu 2022-06-23 14:31 ` Sean Christopherson 2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu 3 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw) To: kvm, linux-kernel Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson Add one new PFN error type to show when we cannot finish fetching the PFN due to interruptions. For example, by receiving a generic signal. This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the SIGIPI) even during e.g. handling an userfaultfd page fault. Signed-off-by: Peter Xu <peterx@redhat.com> --- include/linux/kvm_host.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b646b6fcaec6..4f84a442f67f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -96,6 +96,7 @@ #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) +#define KVM_PFN_ERR_INTR (KVM_PFN_ERR_MASK + 3) /* * error pfns indicate that the gfn is in slot but faild to @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn) return !!(pfn & KVM_PFN_ERR_MASK); } +/* + * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during + * fetching the PFN (e.g. a signal might have arrived), so we may want to + * retry at some later point and kick the userspace to handle the signal. + */ +static inline bool is_intr_pfn(kvm_pfn_t pfn) +{ + return pfn == KVM_PFN_ERR_INTR; +} + /* * error_noslot pfns indicate that the gfn can not be * translated to pfn - it is not in slot or failed to -- 2.32.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR 2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu @ 2022-06-23 14:31 ` Sean Christopherson 2022-06-23 19:32 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 14:31 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Wed, Jun 22, 2022, Peter Xu wrote: > Add one new PFN error type to show when we cannot finish fetching the PFN > due to interruptions. For example, by receiving a generic signal. > > This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the > SIGIPI) even during e.g. handling an userfaultfd page fault. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/kvm_host.h | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index b646b6fcaec6..4f84a442f67f 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -96,6 +96,7 @@ > #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) > #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) > #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) > +#define KVM_PFN_ERR_INTR (KVM_PFN_ERR_MASK + 3) > > /* > * error pfns indicate that the gfn is in slot but faild to > @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn) > return !!(pfn & KVM_PFN_ERR_MASK); > } > > +/* > + * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during > + * fetching the PFN (e.g. a signal might have arrived), so we may want to > + * retry at some later point and kick the userspace to handle the signal. > + */ > +static inline bool is_intr_pfn(kvm_pfn_t pfn) > +{ > + return pfn == KVM_PFN_ERR_INTR; What about is_sigpending_pfn() and KVM_PFN_ERR_SIGPENDING? "intr" is too close to a real thing KVM will encounter, and I think knowing that KVM is effectively responding to a pending signal is the most important detail for KVM developers encountering this code for this first time. E.g. from KVM_PFN_ERR_INTR alone, one might think that any interrupt during GUP will trigger this. > +} > + > /* > * error_noslot pfns indicate that the gfn can not be > * translated to pfn - it is not in slot or failed to > -- > 2.32.0 > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR 2022-06-23 14:31 ` Sean Christopherson @ 2022-06-23 19:32 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-06-23 19:32 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022 at 02:31:47PM +0000, Sean Christopherson wrote: > On Wed, Jun 22, 2022, Peter Xu wrote: > > Add one new PFN error type to show when we cannot finish fetching the PFN > > due to interruptions. For example, by receiving a generic signal. > > > > This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the > > SIGIPI) even during e.g. handling an userfaultfd page fault. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > include/linux/kvm_host.h | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index b646b6fcaec6..4f84a442f67f 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -96,6 +96,7 @@ > > #define KVM_PFN_ERR_FAULT (KVM_PFN_ERR_MASK) > > #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) > > #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) > > +#define KVM_PFN_ERR_INTR (KVM_PFN_ERR_MASK + 3) > > > > /* > > * error pfns indicate that the gfn is in slot but faild to > > @@ -106,6 +107,16 @@ static inline bool is_error_pfn(kvm_pfn_t pfn) > > return !!(pfn & KVM_PFN_ERR_MASK); > > } > > > > +/* > > + * When KVM_PFN_ERR_INTR is returned, it means we're interrupted during > > + * fetching the PFN (e.g. a signal might have arrived), so we may want to > > + * retry at some later point and kick the userspace to handle the signal. > > + */ > > +static inline bool is_intr_pfn(kvm_pfn_t pfn) > > +{ > > + return pfn == KVM_PFN_ERR_INTR; > > What about is_sigpending_pfn() and KVM_PFN_ERR_SIGPENDING? "intr" is too close to > a real thing KVM will encounter, and I think knowing that KVM is effectively > responding to a pending signal is the most important detail for KVM developers > encountering this code for this first time. E.g. from KVM_PFN_ERR_INTR alone, one > might think that any interrupt during GUP will trigger this. Sounds good; INTR could be too general for KVM indeed. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults 2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu ` (2 preceding siblings ...) 2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu @ 2022-06-22 21:36 ` Peter Xu 2022-06-23 14:46 ` Sean Christopherson 3 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-22 21:36 UTC (permalink / raw) To: kvm, linux-kernel Cc: peterx, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List, Sean Christopherson All the facilities should be ready for this, what we need to do is to add a new KVM_GTP_INTERRUPTIBLE flag showing that we're willing to be interrupted by common signals during the __gfn_to_pfn_memslot() request, and wire it up with a FOLL_INTERRUPTIBLE flag that we've just introduced. Note that only x86 slow page fault routine will set this new bit. The new bit is not used in non-x86 arch or on other gup paths even for x86. However it can actually be used elsewhere too but not yet covered. When we see the PFN fetching was interrupted, do early exit to userspace with an KVM_EXIT_INTR exit reason. Signed-off-by: Peter Xu <peterx@redhat.com> --- arch/x86/kvm/mmu/mmu.c | 9 +++++++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 4 ++++ 3 files changed, 14 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e92f1ab63d6a..b39acb7cb16d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, unsigned int access) { + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ + if (unlikely(is_intr_pfn(fault->pfn))) { + vcpu->run->exit_reason = KVM_EXIT_INTR; + ++vcpu->stat.signal_exits; + return -EINTR; + } + /* The pfn is invalid, report the error! */ if (unlikely(is_error_pfn(fault->pfn))) return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) } } + /* Allow to respond to generic signals in slow page faults */ + flags |= KVM_GTP_INTERRUPTIBLE; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, &fault->map_writable, &fault->hva); return RET_PF_CONTINUE; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4f84a442f67f..c8d98e435537 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1163,6 +1163,7 @@ typedef unsigned int __bitwise kvm_gtp_flag_t; #define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) #define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) +#define KVM_GTP_INTERRUPTIBLE ((__force kvm_gtp_flag_t) BIT(2)) kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, kvm_gtp_flag_t gtp_flags, bool *async, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 952400b42ee9..b3873cac5672 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2462,6 +2462,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, flags |= FOLL_WRITE; if (async) flags |= FOLL_NOWAIT; + if (gtp_flags & KVM_GTP_INTERRUPTIBLE) + flags |= FOLL_INTERRUPTIBLE; npages = get_user_pages_unlocked(addr, 1, &page, flags); if (npages != 1) @@ -2599,6 +2601,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn); if (npages == 1) return pfn; + if (npages == -EINTR) + return KVM_PFN_ERR_INTR; mmap_read_lock(current->mm); if (npages == -EHWPOISON || -- 2.32.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults 2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu @ 2022-06-23 14:46 ` Sean Christopherson 2022-06-23 19:31 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 14:46 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Wed, Jun 22, 2022, Peter Xu wrote: > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index e92f1ab63d6a..b39acb7cb16d 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > unsigned int access) > { > + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ > + if (unlikely(is_intr_pfn(fault->pfn))) { > + vcpu->run->exit_reason = KVM_EXIT_INTR; > + ++vcpu->stat.signal_exits; > + return -EINTR; > + } > + > /* The pfn is invalid, report the error! */ > if (unlikely(is_error_pfn(fault->pfn))) > return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > } > } > > + /* Allow to respond to generic signals in slow page faults */ "slow" is being overloaded here. The previous call __gfn_to_pfn_memslot() will end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait. This code really should have a more extensive comment irrespective of the interruptible stuff, now would be a good time to add that. Comments aside, isn't this series incomplete from the perspective that there are still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup? E.g. if KVM is retrieving a page pointed at by vmcs12. > + flags |= KVM_GTP_INTERRUPTIBLE; > fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, flags, NULL, > &fault->map_writable, &fault->hva); > return RET_PF_CONTINUE; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 4f84a442f67f..c8d98e435537 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1163,6 +1163,7 @@ typedef unsigned int __bitwise kvm_gtp_flag_t; > > #define KVM_GTP_WRITE ((__force kvm_gtp_flag_t) BIT(0)) > #define KVM_GTP_ATOMIC ((__force kvm_gtp_flag_t) BIT(1)) > +#define KVM_GTP_INTERRUPTIBLE ((__force kvm_gtp_flag_t) BIT(2)) > > kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, > kvm_gtp_flag_t gtp_flags, bool *async, > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 952400b42ee9..b3873cac5672 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2462,6 +2462,8 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, > flags |= FOLL_WRITE; > if (async) > flags |= FOLL_NOWAIT; > + if (gtp_flags & KVM_GTP_INTERRUPTIBLE) > + flags |= FOLL_INTERRUPTIBLE; > > npages = get_user_pages_unlocked(addr, 1, &page, flags); > if (npages != 1) > @@ -2599,6 +2601,8 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, kvm_gtp_flag_t gtp_flags, bool *async, > npages = hva_to_pfn_slow(addr, async, gtp_flags, writable, &pfn); > if (npages == 1) > return pfn; > + if (npages == -EINTR) > + return KVM_PFN_ERR_INTR; > > mmap_read_lock(current->mm); > if (npages == -EHWPOISON || > -- > 2.32.0 > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults 2022-06-23 14:46 ` Sean Christopherson @ 2022-06-23 19:31 ` Peter Xu 2022-06-23 20:07 ` Sean Christopherson 0 siblings, 1 reply; 39+ messages in thread From: Peter Xu @ 2022-06-23 19:31 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List Hi, Sean, On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote: > On Wed, Jun 22, 2022, Peter Xu wrote: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index e92f1ab63d6a..b39acb7cb16d 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > unsigned int access) > > { > > + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ > > + if (unlikely(is_intr_pfn(fault->pfn))) { > > + vcpu->run->exit_reason = KVM_EXIT_INTR; > > + ++vcpu->stat.signal_exits; > > + return -EINTR; > > + } > > + > > /* The pfn is invalid, report the error! */ > > if (unlikely(is_error_pfn(fault->pfn))) > > return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > } > > } > > > > + /* Allow to respond to generic signals in slow page faults */ > > "slow" is being overloaded here. The previous call __gfn_to_pfn_memslot() will > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait. > This code really should have a more extensive comment irrespective of the interruptible > stuff, now would be a good time to add that. Yes I agree, especially the "async" parameter along with "atomic" makes it even more confusing as you said. But isn't that also means the "slow" here is spot-on? I mean imho it's the "elsewhere" needs cleanup not this comment itself since it's really stating the fact that this is the real slow path? Or any other suggestions greatly welcomed on how I should improve this comment. > > Comments aside, isn't this series incomplete from the perspective that there are > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup? E.g. if > KVM is retrieving a page pointed at by vmcs12. Right. The thing is I'm not confident I can make it complete easily in one shot.. I mentioned some of that in cover letter or commit message of patch 1, in that I don't think all the gup call sites are okay with being interrupted by a non-fatal signal. So what I want to do is doing it step by step, at least by introducing FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid use case. I'm also pretty sure the page fault path is really the most cases that will happen with GUP, so it already helps in many ways for me when running with a patched kernel. So when the complete picture is non-trivial to achieve in one shot, I think this could be one option we go for. With the facility (and example code on x86 slow page fault) ready, hopefully we could start to convert many other call sites to be signal-aware, outside page faults, or even outside x86, because it's really a more generic problem, which I fully agree. Does it sound reasonable to you? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults 2022-06-23 19:31 ` Peter Xu @ 2022-06-23 20:07 ` Sean Christopherson 2022-06-23 20:18 ` Peter Xu 0 siblings, 1 reply; 39+ messages in thread From: Sean Christopherson @ 2022-06-23 20:07 UTC (permalink / raw) To: Peter Xu Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022, Peter Xu wrote: > Hi, Sean, > > On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote: > > On Wed, Jun 22, 2022, Peter Xu wrote: > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index e92f1ab63d6a..b39acb7cb16d 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > > static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > > unsigned int access) > > > { > > > + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ > > > + if (unlikely(is_intr_pfn(fault->pfn))) { > > > + vcpu->run->exit_reason = KVM_EXIT_INTR; > > > + ++vcpu->stat.signal_exits; > > > + return -EINTR; > > > + } > > > + > > > /* The pfn is invalid, report the error! */ > > > if (unlikely(is_error_pfn(fault->pfn))) > > > return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); > > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > > } > > > } > > > > > > + /* Allow to respond to generic signals in slow page faults */ > > > > "slow" is being overloaded here. The previous call __gfn_to_pfn_memslot() will > > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait. > > This code really should have a more extensive comment irrespective of the interruptible > > stuff, now would be a good time to add that. > > Yes I agree, especially the "async" parameter along with "atomic" makes it > even more confusing as you said. But isn't that also means the "slow" here > is spot-on? I mean imho it's the "elsewhere" needs cleanup not this > comment itself since it's really stating the fact that this is the real > slow path? No, because atomic=false here, i.e. KVM will try hva_to_pfn_slow() if hva_to_pfn_fast() fails. So even if we agree that the "wait for IO" path is the true slow path, when reading KVM code the vast, vast majority of developers will associate "slow" with hva_to_pfn_slow(). > Or any other suggestions greatly welcomed on how I should improve this > comment. Something along these lines? /* * Allow gup to bail on pending non-fatal signals when it's also allowed * to wait for IO. Note, gup always bails if it is unable to quickly * get a page and a fatal signal, i.e. SIGKILL, is pending. */ > > > > > Comments aside, isn't this series incomplete from the perspective that there are > > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup? E.g. if > > KVM is retrieving a page pointed at by vmcs12. > > Right. The thing is I'm not confident I can make it complete easily in one > shot.. > > I mentioned some of that in cover letter or commit message of patch 1, in > that I don't think all the gup call sites are okay with being interrupted > by a non-fatal signal. > > So what I want to do is doing it step by step, at least by introducing > FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid > use case. I'm also pretty sure the page fault path is really the most > cases that will happen with GUP, so it already helps in many ways for me > when running with a patched kernel. > > So when the complete picture is non-trivial to achieve in one shot, I think > this could be one option we go for. With the facility (and example code on > x86 slow page fault) ready, hopefully we could start to convert many other > call sites to be signal-aware, outside page faults, or even outside x86, > because it's really a more generic problem, which I fully agree. > > Does it sound reasonable to you? Yep. In fact, I'd be totally ok keeping this to just the page fault path. I missed one cruicial detail on my first read through: gup already bails on SIGKILL, it's only these technically-not-fatal signals that gup ignores by default. In other words, using FOLL_INTERRUPTIBLE is purely opportunsitically as userspace can always resort to SIGKILL if the VM really needs to die. It would be very helpful to explicit call that out in the changelog, that way it's (hopefully) clear that KVM uses FOLL_INTERRUPTIBLE to be user friendly when it's easy to do so, and that it's not required for correctness/robustness. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults 2022-06-23 20:07 ` Sean Christopherson @ 2022-06-23 20:18 ` Peter Xu 0 siblings, 0 replies; 39+ messages in thread From: Peter Xu @ 2022-06-23 20:18 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Paolo Bonzini, Andrew Morton, David Hildenbrand, Dr . David Alan Gilbert, Andrea Arcangeli, Linux MM Mailing List On Thu, Jun 23, 2022 at 08:07:00PM +0000, Sean Christopherson wrote: > On Thu, Jun 23, 2022, Peter Xu wrote: > > Hi, Sean, > > > > On Thu, Jun 23, 2022 at 02:46:08PM +0000, Sean Christopherson wrote: > > > On Wed, Jun 22, 2022, Peter Xu wrote: > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index e92f1ab63d6a..b39acb7cb16d 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > @@ -3012,6 +3012,13 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > > > static int handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, > > > > unsigned int access) > > > > { > > > > + /* NOTE: not all error pfn is fatal; handle intr before the other ones */ > > > > + if (unlikely(is_intr_pfn(fault->pfn))) { > > > > + vcpu->run->exit_reason = KVM_EXIT_INTR; > > > > + ++vcpu->stat.signal_exits; > > > > + return -EINTR; > > > > + } > > > > + > > > > /* The pfn is invalid, report the error! */ > > > > if (unlikely(is_error_pfn(fault->pfn))) > > > > return kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); > > > > @@ -4017,6 +4024,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > > > } > > > > } > > > > > > > > + /* Allow to respond to generic signals in slow page faults */ > > > > > > "slow" is being overloaded here. The previous call __gfn_to_pfn_memslot() will > > > end up in hva_to_pfn_slow(), but because of passing a non-null async it won't wait. > > > This code really should have a more extensive comment irrespective of the interruptible > > > stuff, now would be a good time to add that. > > > > Yes I agree, especially the "async" parameter along with "atomic" makes it > > even more confusing as you said. But isn't that also means the "slow" here > > is spot-on? I mean imho it's the "elsewhere" needs cleanup not this > > comment itself since it's really stating the fact that this is the real > > slow path? > > No, because atomic=false here, i.e. KVM will try hva_to_pfn_slow() if hva_to_pfn_fast() > fails. So even if we agree that the "wait for IO" path is the true slow path, > when reading KVM code the vast, vast majority of developers will associate "slow" > with hva_to_pfn_slow(). Okay. I think how we define slow matters, here my take is "when a major fault happens" (as defined in the mm term), but probably that definition is a bit far away from kvm as the hypervisor level indeed. > > > Or any other suggestions greatly welcomed on how I should improve this > > comment. > > Something along these lines? > > /* > * Allow gup to bail on pending non-fatal signals when it's also allowed > * to wait for IO. Note, gup always bails if it is unable to quickly > * get a page and a fatal signal, i.e. SIGKILL, is pending. > */ Taken. > > > > > > > > Comments aside, isn't this series incomplete from the perspective that there are > > > still many flows where KVM will hang if gfn_to_pfn() gets stuck in gup? E.g. if > > > KVM is retrieving a page pointed at by vmcs12. > > > > Right. The thing is I'm not confident I can make it complete easily in one > > shot.. > > > > I mentioned some of that in cover letter or commit message of patch 1, in > > that I don't think all the gup call sites are okay with being interrupted > > by a non-fatal signal. > > > > So what I want to do is doing it step by step, at least by introducing > > FOLL_INTERRUPTIBLE and having one valid user of it that covers a very valid > > use case. I'm also pretty sure the page fault path is really the most > > cases that will happen with GUP, so it already helps in many ways for me > > when running with a patched kernel. > > > > So when the complete picture is non-trivial to achieve in one shot, I think > > this could be one option we go for. With the facility (and example code on > > x86 slow page fault) ready, hopefully we could start to convert many other > > call sites to be signal-aware, outside page faults, or even outside x86, > > because it's really a more generic problem, which I fully agree. > > > > Does it sound reasonable to you? > > Yep. In fact, I'd be totally ok keeping this to just the page fault path. I > missed one cruicial detail on my first read through: gup already bails on SIGKILL, > it's only these technically-not-fatal signals that gup ignores by default. In > other words, using FOLL_INTERRUPTIBLE is purely opportunsitically as userspace > can always resort to SIGKILL if the VM really needs to die. > > It would be very helpful to explicit call that out in the changelog, that way > it's (hopefully) clear that KVM uses FOLL_INTERRUPTIBLE to be user friendly when > it's easy to do so, and that it's not required for correctness/robustness. Yes that's the case, sigkill is special. I can mention that somewhere in the cover letter too besides the comment you suggested above. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2022-07-07 15:07 UTC | newest] Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-22 21:36 [PATCH 0/4] kvm/mm: Allow GUP to respond to non fatal signals Peter Xu 2022-06-22 21:36 ` [PATCH 1/4] mm/gup: Add FOLL_INTERRUPTIBLE Peter Xu 2022-06-25 0:35 ` Jason Gunthorpe 2022-06-25 1:23 ` Peter Xu 2022-06-25 23:59 ` Jason Gunthorpe 2022-06-27 15:29 ` Peter Xu 2022-06-28 2:07 ` John Hubbard 2022-06-28 19:31 ` Peter Xu 2022-06-28 21:40 ` John Hubbard 2022-06-28 22:33 ` Peter Xu 2022-06-29 0:31 ` John Hubbard 2022-06-29 15:47 ` Peter Xu 2022-06-30 1:53 ` John Hubbard 2022-06-30 13:49 ` Peter Xu 2022-06-30 19:01 ` John Hubbard 2022-06-30 21:27 ` Peter Xu 2022-07-04 22:48 ` Matthew Wilcox 2022-07-07 15:06 ` Peter Xu 2022-06-22 21:36 ` [PATCH 2/4] kvm: Merge "atomic" and "write" in __gfn_to_pfn_memslot() Peter Xu 2022-06-23 14:49 ` Sean Christopherson 2022-06-23 19:46 ` Peter Xu 2022-06-23 20:29 ` Sean Christopherson 2022-06-23 21:29 ` Peter Xu 2022-06-23 21:52 ` Sean Christopherson 2022-06-27 19:12 ` John Hubbard 2022-06-28 2:17 ` John Hubbard 2022-06-28 19:46 ` Peter Xu 2022-06-28 21:52 ` John Hubbard 2022-06-28 22:50 ` Peter Xu 2022-06-28 22:55 ` John Hubbard 2022-06-28 23:02 ` Peter Xu 2022-06-22 21:36 ` [PATCH 3/4] kvm: Add new pfn error KVM_PFN_ERR_INTR Peter Xu 2022-06-23 14:31 ` Sean Christopherson 2022-06-23 19:32 ` Peter Xu 2022-06-22 21:36 ` [PATCH 4/4] kvm/x86: Allow to respond to generic signals during slow page faults Peter Xu 2022-06-23 14:46 ` Sean Christopherson 2022-06-23 19:31 ` Peter Xu 2022-06-23 20:07 ` Sean Christopherson 2022-06-23 20:18 ` Peter Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.