All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 19/23] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
Date: Sun, 25 Apr 2021 22:08:42 -0400	[thread overview]
Message-ID: <20210426020842.GB85002@xz-x1> (raw)
In-Reply-To: <3178f1ff-f8da-7fdd-68ef-8c35972ca2e1@oracle.com>

On Thu, Apr 22, 2021 at 03:45:39PM -0700, Mike Kravetz wrote:
> On 3/22/21 5:50 PM, Peter Xu wrote:
> > Teach the hugetlb page fault code to understand uffd-wp special pte.  For
> > example, when seeing such a pte we need to convert any write fault into a read
> > one (which is fake - we'll retry the write later if so).  Meanwhile, for
> > handle_userfault() we'll need to make sure we must wait for the special swap
> > pte too just like a none pte.
> > 
> > Note that we also need to teach UFFDIO_COPY about this special pte across the
> > code path so that we can safely install a new page at this special pte as long
> > as we know it's a stall entry.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  fs/userfaultfd.c |  5 ++++-
> >  mm/hugetlb.c     | 34 +++++++++++++++++++++++++++-------
> >  mm/userfaultfd.c |  5 ++++-
> >  3 files changed, 35 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index 72956f9cc892..f6fa34f58c37 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -245,8 +245,11 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
> >  	/*
> >  	 * Lockless access: we're in a wait_event so it's ok if it
> >  	 * changes under us.
> > +	 *
> > +	 * Regarding uffd-wp special case, please refer to comments in
> > +	 * userfaultfd_must_wait().
> >  	 */
> > -	if (huge_pte_none(pte))
> > +	if (huge_pte_none(pte) || pte_swp_uffd_wp_special(pte))
> >  		ret = true;
> >  	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
> >  		ret = true;
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 64e424b03774..448ef745d5ee 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -4369,7 +4369,8 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
> >  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >  			struct vm_area_struct *vma,
> >  			struct address_space *mapping, pgoff_t idx,
> > -			unsigned long address, pte_t *ptep, unsigned int flags)
> > +			unsigned long address, pte_t *ptep,
> > +			pte_t old_pte, unsigned int flags)
> >  {
> >  	struct hstate *h = hstate_vma(vma);
> >  	vm_fault_t ret = VM_FAULT_SIGBUS;
> > @@ -4493,7 +4494,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >  
> >  	ptl = huge_pte_lock(h, mm, ptep);
> >  	ret = 0;
> > -	if (!huge_pte_none(huge_ptep_get(ptep)))
> > +	if (!pte_same(huge_ptep_get(ptep), old_pte))
> >  		goto backout;
> >  
> >  	if (anon_rmap) {
> > @@ -4503,6 +4504,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >  		page_dup_rmap(page, true);
> >  	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
> >  				&& (vma->vm_flags & VM_SHARED)));
> > +	if (unlikely(flags & FAULT_FLAG_UFFD_WP)) {
> > +		WARN_ON_ONCE(flags & FAULT_FLAG_WRITE);
> > +		/* We should have the write bit cleared already, but be safe */
> > +		new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte));
> > +	}
> >  	set_huge_pte_at(mm, haddr, ptep, new_pte);
> >  
> >  	hugetlb_count_add(pages_per_huge_page(h), mm);
> > @@ -4584,9 +4590,16 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >  		if (unlikely(is_hugetlb_entry_migration(entry))) {
> >  			migration_entry_wait_huge(vma, mm, ptep);
> >  			return 0;
> > -		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> > +		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
> >  			return VM_FAULT_HWPOISON_LARGE |
> >  				VM_FAULT_SET_HINDEX(hstate_index(h));
> > +		} else if (unlikely(is_swap_special_pte(entry))) {
> > +			/* Must be a uffd-wp special swap pte */
> > +			WARN_ON_ONCE(!pte_swp_uffd_wp_special(entry));
> > +			flags |= FAULT_FLAG_UFFD_WP;
> > +			/* Emulate a read fault */
> > +			flags &= ~FAULT_FLAG_WRITE;
> > +		}
> 
> The comment above this if/else block points out that we hold no locks
> and are only checking conditions that would cause a quick return.  Yet,
> this new code is potentially modifying flags.  Pretty sure we can race
> and have the entry change.
> 
> Not sure of all the side effects of emulating a read if changed entry is
> not a uffd-wp special swap pte and we emulate read when we should not.
> 
> Perhaps we should just put this check and modification of flags after
> taking the fault mutex and before the change below?

That's a great point.  Even the WARN_ON_ONCE could trigger if the pte got
modified in parallel, so definitely broken.

Yes I'd better do that with the pgtable lock, mostly hugetlb_no_page() should
be the only function to handle this special case. So maybe I don't need to
emulate the READ fault at all, but just check pte_swp_uffd_wp_special() with
the lock, then do wrprotect properly should suffice.  Maybe that's even true
for shmem, I'll think more about it.

Thanks!

-- 
Peter Xu


  reply	other threads:[~2021-04-26  2:08 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23  0:48 [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-03-23  0:48 ` [PATCH 01/23] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-03-23  0:48 ` [PATCH 02/23] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-03-23  2:34   ` Miaohe Lin
2021-03-23 15:40     ` Peter Xu
2021-03-23  0:48 ` [PATCH 03/23] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-03-23  0:48 ` [PATCH 04/23] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-03-23  0:48 ` [PATCH 05/23] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-03-23  0:48 ` [PATCH 06/23] mm: Drop first_index/last_index in zap_details Peter Xu
2021-03-23  0:48 ` [PATCH 07/23] mm: Introduce zap_details.zap_flags Peter Xu
2021-03-23  2:11   ` Matthew Wilcox
2021-03-23 15:43     ` Peter Xu
2021-03-23  0:48 ` [PATCH 08/23] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-03-23  0:48 ` [PATCH 09/23] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-03-23  0:48 ` [PATCH 10/23] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-03-23  0:49 ` [PATCH 11/23] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-03-23  0:49 ` [PATCH 12/23] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-03-23  0:49 ` [PATCH 13/23] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-03-23  0:49 ` [PATCH 14/23] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-03-23  0:49 ` [PATCH 15/23] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-04-21 22:02   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 16/23] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-04-21 23:06   ` Mike Kravetz
2021-04-22  1:14     ` Peter Xu
2021-03-23  0:49 ` [PATCH 17/23] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-04-22 18:22   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 18/23] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-04-22 19:00   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 19/23] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-04-22 22:45   ` Mike Kravetz
2021-04-26  2:08     ` Peter Xu [this message]
2021-03-23  0:50 ` [PATCH 20/23] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-04-23  0:08   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 21/23] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-04-23 20:33   ` Mike Kravetz
2021-04-26 21:16     ` Peter Xu
2021-04-26 21:36       ` Mike Kravetz
2021-04-26 22:05         ` Peter Xu
2021-04-26 23:09           ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 22/23] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-03-23  0:50 ` [PATCH 23/23] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-03-23  0:54 ` [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-04-21 16:03 ` Peter Xu
2021-04-21 21:39   ` Mike Kravetz
2021-04-22  1:16     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426020842.GB85002@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.