All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leonro@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Maya B . Gokhale" <gokhale2@llnl.gov>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Marty Mcfadden <mcfadden8@llnl.gov>,
	Kirill Shutemov <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Jan Kara <jack@suse.cz>, Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification
Date: Tue, 15 Sep 2020 13:23:49 +0300	[thread overview]
Message-ID: <20200915102349.GI35718@unreal> (raw)
In-Reply-To: <20200914211515.GA5901@xz-x1>

On Mon, Sep 14, 2020 at 05:15:15PM -0400, Peter Xu wrote:
> On Mon, Sep 14, 2020 at 02:34:36PM -0400, Peter Xu wrote:
> > On Mon, Sep 14, 2020 at 10:32:11AM -0700, Linus Torvalds wrote:
> > > On Mon, Sep 14, 2020 at 7:38 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > > >
> > > > I don't have a detailed explanation right now, but this patch appears
> > > > to be causing a regression where RDMA subsystem tests fail. Tests
> > > > return to normal when this patch is reverted.
> > > >
> > > > It kind of looks like the process is not seeing DMA'd data to a
> > > > pin_user_pages()?
> > >
> > > I'm a nincompoop. I actually _talked_ to Hugh Dickins about this when
> > > he raised concerns, and I dismissed his concerns with "but PAGE_PIN is
> > > special".
> > >
> > > As usual, Hugh was right. Page pinning certainly _is_ special, but
> > > it's not that different from the regular GUP code.
> > >
> > > But in the meantime, I have a lovely confirmation from the kernel test
> > > robot, saying that commit 09854ba94c results in a
> > > "vm-scalability.throughput 31.4% improvement", which was what I was
> > > hoping for - the complexity wasn't just complexity, it was active
> > > badness due to the page locking horrors.
> > >
> > > I think what we want to do is basically do the "early COW", but only
> > > do it for FOLL_PIN (and not turn them into writes for anything but the
> > > COW code). So basically redo the "enforced COW mechanism", but rather
> > > than do it for everything, now do it only for FOLL_PIN, and only in
> > > that COW path.
> > >
> > > Peter - any chance you can look at this? I'm still looking at the page
> > > lock fairness performance regression, although I now think I have a
> > > test patch for Phoronix to test out.
> >
> > Sure, I'll try to prepare something like that and share it shortly.
>
> Jason, would you please try the attached patch to see whether it unbreaks the
> rdma test?  Thanks!
>
> --
> Peter Xu

> From 93c534866d2c548cf193a5c17f7058a1f770df5a Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Mon, 14 Sep 2020 15:34:41 -0400
> Subject: [PATCH] mm/gup: Allow enfornced COW for FOLL_PIN
>
> FOLL_PIN may need the enforced COW mechanism as reported by Jason and analyzed
> by Linus [1].  This is a continued work based on previous patch [2], however
> there's some trivial differences.
>
> Instead of applying enforced COW everywhere, we only apply it for FOLL_PIN to
> make sure the pages that were pinned will not be COWed again later on.  In
> other words, we'll do early phase COW for pinned page along with the gup
> procedure.  And since only FOLL_PIN is affected, we don't need to introduce a
> new flag as FOLL_BREAK_COW.  However we'll still need a new fault flag as
> FAULT_FLAG_BREAK_COW inside the page fault handler.
>
> Fast gup is not affected by this because it is never used with FOLL_PIN.
>
> Now userfaultfd-wp needs to be ready with COW happening since read gup could
> trigger COW now with FOLL_PIN (which will never happen previously).  So when
> COW happens we'll need to carry over the uffd-wp bits too if it's there.
>
> Meanwhile, both userfaultfd_pte_wp() and userfaultfd_huge_pmd_wp() need to be
> smarter than before on that it needs to return true only if this is a "real"
> write fault.  With that extra check, we can identify a real write against an
> enforced COW procedure from a FOLL_PIN gup.
>
> Note: hugetlbfs is not considered throughout this patch, because it's missing
> some required bits after all (like proper setting of FOLL_COW when page fault
> retries).  Considering we may want to unbreak RDMA tests even during the rcs,
> this patch only fixes the non-hugetlbfs cases. THPs should still be in count.
>
> [1] https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com
> [2] https://lore.kernel.org/lkml/20200811183950.10603-1-peterx@redhat.com
>
> Reported-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/mm.h            |  2 ++
>  include/linux/userfaultfd_k.h | 12 ++++++------
>  mm/gup.c                      | 17 ++++++++++++-----
>  mm/huge_memory.c              | 17 ++++++++++++-----
>  mm/memory.c                   | 16 +++++++++-------
>  5 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ca6e6a81576b..741574bfd343 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -416,6 +416,7 @@ extern pgprot_t protection_map[16];
>   * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
>   * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
>   * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals.
> + * @FAULT_FLAG_BREAK_COW: Do COW explicitly for the fault (even for read).
>   *
>   * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify
>   * whether we would allow page faults to retry by specifying these two
> @@ -446,6 +447,7 @@ extern pgprot_t protection_map[16];
>  #define FAULT_FLAG_REMOTE			0x80
>  #define FAULT_FLAG_INSTRUCTION  		0x100
>  #define FAULT_FLAG_INTERRUPTIBLE		0x200
> +#define FAULT_FLAG_BREAK_COW			0x400
>
>  /*
>   * The default fault flags that should be used by most of the
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index a8e5f3ea9bb2..fbcb75daf870 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -62,16 +62,16 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
>  	return vma->vm_flags & VM_UFFD_WP;
>  }
>
> -static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
> -				      pte_t pte)
> +static inline bool userfaultfd_pte_wp(struct vm_fault *vmf, pte_t pte)
>  {
> -	return userfaultfd_wp(vma) && pte_uffd_wp(pte);
> +	return (vmf->flags & FAULT_FLAG_WRITE) &&
> +	    userfaultfd_wp(vmf->vma) && pte_uffd_wp(pte);
>  }
>
> -static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma,
> -					   pmd_t pmd)
> +static inline bool userfaultfd_huge_pmd_wp(struct vm_fault *vmf, pmd_t pmd)
>  {
> -	return userfaultfd_wp(vma) && pmd_uffd_wp(pmd);
> +	return (vmf->flags & FAULT_FLAG_WRITE) &&
> +	    userfaultfd_wp(vmf->vma) && pmd_uffd_wp(pmd);
>  }

Don't forget to change !CONFIG_USERFAULTFD declarations too.

Thanks

  parent reply	other threads:[~2020-09-15 10:23 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 23:49 [PATCH 0/4] mm: Simplfy cow handling Peter Xu
2020-08-21 23:49 ` [PATCH 1/4] mm: Trial do_wp_page() simplification Peter Xu
2020-08-24  8:36   ` Kirill Tkhai
2020-08-24 14:30     ` Jan Kara
2020-08-24 15:37       ` Kirill Tkhai
2020-08-24 18:22         ` Linus Torvalds
2020-08-24 18:22           ` Linus Torvalds
2020-09-01  7:01           ` Hugh Dickins
2020-09-01  7:01             ` Hugh Dickins
2020-09-14 14:38   ` Jason Gunthorpe
2020-09-14 17:32     ` Linus Torvalds
2020-09-14 17:32       ` Linus Torvalds
2020-09-14 18:34       ` Peter Xu
2020-09-14 21:15         ` Peter Xu
2020-09-14 22:55           ` Jason Gunthorpe
2020-09-14 22:59             ` Linus Torvalds
2020-09-14 22:59               ` Linus Torvalds
2020-09-14 23:28               ` Jason Gunthorpe
2020-09-15  0:19                 ` Linus Torvalds
2020-09-15  0:19                   ` Linus Torvalds
2020-09-15 14:50                 ` Peter Xu
2020-09-15 15:17                   ` Peter Xu
2020-09-15 16:05                   ` Jason Gunthorpe
2020-09-15 18:29                     ` Jason Gunthorpe
2020-09-15 19:13                       ` Peter Xu
2020-09-15 19:38                         ` Jason Gunthorpe
2020-09-15 21:33                           ` Peter Xu
2020-09-15 23:22                             ` Jason Gunthorpe
2020-09-16  1:50                               ` John Hubbard
2020-09-16 17:48                                 ` Jason Gunthorpe
2020-09-16 18:46                                   ` Peter Xu
2020-09-17 11:25                                     ` Jason Gunthorpe
2020-09-17 18:11                                       ` Linus Torvalds
2020-09-17 18:11                                         ` Linus Torvalds
2020-09-17 19:38                                         ` Jason Gunthorpe
2020-09-17 19:51                                           ` Linus Torvalds
2020-09-17 19:51                                             ` Linus Torvalds
2020-09-18 16:40                                             ` Peter Xu
2020-09-18 17:16                                               ` Linus Torvalds
2020-09-18 17:16                                                 ` Linus Torvalds
2020-09-18 19:57                                                 ` Peter Xu
2020-09-18 17:32                                               ` Jason Gunthorpe
2020-09-18 20:40                                                 ` Peter Xu
2020-09-18 20:59                                                   ` Linus Torvalds
2020-09-18 20:59                                                     ` Linus Torvalds
2020-09-19  0:28                                                     ` Jason Gunthorpe
2020-09-18 21:06                                                   ` John Hubbard
2020-09-19  0:01                                                     ` Jason Gunthorpe
2020-09-21  8:35                                                       ` Jan Kara
2020-09-21 12:03                                                         ` Jason Gunthorpe
2022-02-16 16:59                                                           ` Oded Gabbay
2022-02-16 17:24                                                             ` Oded Gabbay
2022-02-16 19:04                                                             ` Linus Torvalds
2022-02-16 19:20                                                               ` Oded Gabbay
2022-02-16 19:24                                                               ` David Hildenbrand
2020-09-21 13:42                                               ` Michal Hocko
2020-09-21 14:18                                                 ` Peter Xu
2020-09-21 14:28                                                   ` Michal Hocko
2020-09-21 14:38                                                     ` Tejun Heo
2020-09-21 14:43                                                       ` Christian Brauner
2020-09-21 14:55                                                         ` Michal Hocko
2020-09-21 15:04                                                           ` Christian Brauner
2020-09-21 16:06                                                             ` Michal Hocko
2020-09-23  7:53                                                               ` Michal Hocko
2020-09-21 14:41                                                 ` Christian Brauner
2020-09-21 14:57                                                   ` Michal Hocko
2020-09-21 16:31                                                     ` Peter Xu
2020-09-17 18:14                                       ` Peter Xu
2020-09-17 18:26                                         ` Linus Torvalds
2020-09-17 18:26                                           ` Linus Torvalds
2020-09-17 19:03                                           ` Peter Xu
2020-09-17 19:42                                             ` Linus Torvalds
2020-09-17 19:42                                               ` Linus Torvalds
2020-09-17 19:55                                               ` John Hubbard
2020-09-17 20:06                                               ` Jason Gunthorpe
2020-09-17 20:19                                                 ` John Hubbard
2020-09-17 20:25                                                   ` Jason Gunthorpe
2020-09-17 20:35                                                 ` Linus Torvalds
2020-09-17 20:35                                                   ` Linus Torvalds
2020-09-17 21:40                                                   ` Peter Xu
2020-09-17 22:09                                                     ` Jason Gunthorpe
2020-09-17 22:25                                                       ` Linus Torvalds
2020-09-17 22:25                                                         ` Linus Torvalds
2020-09-17 22:48                                                       ` Ira Weiny
2020-09-18  9:36                                                         ` Jan Kara
2020-09-18  9:44                                                       ` Jan Kara
2020-09-18 16:19                                             ` Jason Gunthorpe
2020-09-15 10:23           ` Leon Romanovsky [this message]
2020-09-15 15:56           ` Jason Gunthorpe
2020-09-15 15:03   ` Oleg Nesterov
2020-09-15 16:18     ` Peter Xu
2020-08-21 23:49 ` [PATCH 2/4] mm/ksm: Remove reuse_ksm_page() Peter Xu
2020-08-21 23:49 ` [PATCH 3/4] mm/gup: Remove enfornced COW mechanism Peter Xu
2020-09-14 14:27   ` Oleg Nesterov
2020-09-14 17:59     ` Peter Xu
2020-09-14 19:03       ` Linus Torvalds
2020-09-14 19:03         ` Linus Torvalds
2020-08-21 23:49 ` [PATCH 4/4] mm: Add PGREUSE counter Peter Xu
2020-08-22 16:14   ` Linus Torvalds
2020-08-22 16:14     ` Linus Torvalds
2020-08-24  0:24     ` Peter Xu
2020-08-22 16:05 ` [PATCH 0/4] mm: Simplfy cow handling Linus Torvalds
2020-08-22 16:05   ` Linus Torvalds
2020-08-23 23:58   ` Peter Xu
2020-08-24  8:38 ` Kirill Tkhai
2020-08-27 14:15 ` Peter Xu
2021-02-02 14:40 [PATCH 1/4] mm: Trial do_wp_page() simplification Gal Pressman
2021-02-02 16:31 ` Peter Xu
2021-02-02 16:44   ` Jason Gunthorpe
2021-02-02 17:05     ` Peter Xu
2021-02-02 17:13       ` Jason Gunthorpe
2021-02-03 12:43         ` Gal Pressman
2021-02-03 14:00           ` Jason Gunthorpe
2021-02-03 14:47             ` Gal Pressman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200915102349.GI35718@unreal \
    --to=leonro@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gokhale2@llnl.gov \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcfadden8@llnl.gov \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.