All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Jerome Glisse <jglisse@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Hugh Dickins <hughd@google.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [PATCH 16/23] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
Date: Wed, 21 Apr 2021 21:14:19 -0400	[thread overview]
Message-ID: <20210422011419.GB6404@xz-x1> (raw)
In-Reply-To: <3cde4e9c-6887-6ded-2f34-2d031569badf@oracle.com>

On Wed, Apr 21, 2021 at 04:06:39PM -0700, Mike Kravetz wrote:
> On 3/22/21 5:49 PM, Peter Xu wrote:
> > Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout
> > the stack.  Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with
> > UFFDIO_COPY.  Introduce huge_pte_mkuffd_wp() for it.
> > 
> > Note that similar to how we've handled shmem, we'd better keep setting the
> > dirty bit even if UFFDIO_COPY_MODE_WP is provided, so that the core mm will
> > know this page contains valid data and never drop it.
> 
> There is nothing wrong with setting the dirty bit in this manner to be
> consistent.  But, since hugetlb pages are only managed by hugetlbfs, the
> core mm will not drop them.

Right, for this paragraph, how about I rephrase it as below?

  Hugetlb pages are only managed by hugetlbfs, so we're safe even without
  setting dirty bit in the huge pte if the page is installed as read-only.
  However we'd better still keep the dirty bit set for a read-only UFFDIO_COPY
  pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with
  shmem, but also because the page does contain dirty data that the kernel just
  copied from the userspace.

> 
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/asm-generic/hugetlb.h |  5 +++++
> >  include/linux/hugetlb.h       |  6 ++++--
> >  mm/hugetlb.c                  | 22 +++++++++++++++++-----
> >  mm/userfaultfd.c              | 12 ++++++++----
> >  4 files changed, 34 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> > index 8e1e6244a89d..548212eccbd6 100644
> > --- a/include/asm-generic/hugetlb.h
> > +++ b/include/asm-generic/hugetlb.h
> > @@ -27,6 +27,11 @@ static inline pte_t huge_pte_mkdirty(pte_t pte)
> >  	return pte_mkdirty(pte);
> >  }
> >  
> > +static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
> > +{
> > +	return pte_mkuffd_wp(pte);
> > +}
> > +
> 
> Just want to verify that userfaultfd wp support is only enabled for
> x86_64 now?  I only ask because there are arch specific hugetlb pte
> manipulation routines for some architectures. 

Yes it's x86_64 only, as we have:

	select HAVE_ARCH_USERFAULTFD_WP         if X86_64 && USERFAULTFD

Here the helper defines the huge pte uffd-wp bit to be the same as the pte
level bit, across all archs.  However should be fine since for any arch that
does not support uffd-wp, it'll be no-op:

static __always_inline pte_t pte_mkuffd_wp(pte_t pte)
{
	return pte;
}

Meanwhile it shouldn't be called anywhere as well.

Here I can also define __HAVE_ARCH_HUGE_PTE_MKUFFD_WP, however I didn't do so
as I thought above should be enough.  Then we can define it when really
necessary.

> 
> >  static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
> >  {
> >  	return pte_modify(pte, newprot);
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index a7f7d5f328dc..ef8d2b8427b1 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -141,7 +141,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
> >  				unsigned long dst_addr,
> >  				unsigned long src_addr,
> >  				enum mcopy_atomic_mode mode,
> > -				struct page **pagep);
> > +				struct page **pagep,
> > +				bool wp_copy);
> >  #endif /* CONFIG_USERFAULTFD */
> >  bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
> >  						struct vm_area_struct *vma,
> > @@ -321,7 +322,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >  						unsigned long dst_addr,
> >  						unsigned long src_addr,
> >  						enum mcopy_atomic_mode mode,
> > -						struct page **pagep)
> > +						struct page **pagep,
> > +						bool wp_copy)
> >  {
> >  	BUG();
> >  	return 0;
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index def2c7ddf3ae..f0e55b341ebd 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -4725,7 +4725,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >  			    unsigned long dst_addr,
> >  			    unsigned long src_addr,
> >  			    enum mcopy_atomic_mode mode,
> > -			    struct page **pagep)
> > +			    struct page **pagep,
> > +			    bool wp_copy)
> >  {
> >  	bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> >  	struct address_space *mapping;
> > @@ -4822,17 +4823,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> >  		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> >  	}
> >  
> > -	/* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */
> > -	if (is_continue && !vm_shared)
> > +	/*
> > +	 * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
> > +	 * with wp flag set, don't set pte write bit.
> > +	 */
> > +	if (wp_copy || (is_continue && !vm_shared))
> >  		writable = 0;
> >  	else
> >  		writable = dst_vma->vm_flags & VM_WRITE;
> >  
> >  	_dst_pte = make_huge_pte(dst_vma, page, writable);
> > -	if (writable)
> > -		_dst_pte = huge_pte_mkdirty(_dst_pte);
> > +	/*
> > +	 * Always mark UFFDIO_COPY page dirty; note that this may not be
> > +	 * extremely important for hugetlbfs for now since swapping is not
> > +	 * supported, but we should still be clear in that this page cannot be
> > +	 * thrown away at will, even if write bit not set.
> 
> As mentioned earlier there should not be any issue with hugetlb pages
> being thrown away without dirty set.  Perhaps, the comment should
> reflect that this is mostly for consistency.

Yes, functional-wise it seems not necessary, however I also think it's a bit
more than being consistency with what we do with shmem (as also rephrased in
above commit message): UFFDIO_COPY page should be marked as dirty by the
definition of "dirty bit", imho, since the data is indeed dirty (kernel copied
that data from userspace)?

> 
> Note to self: this may help when I get back to hugetlb soft dirty
> support.
> 
> Other than that, patch looks good.

Thanks!

-- 
Peter Xu


  reply	other threads:[~2021-04-22  1:14 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23  0:48 [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-03-23  0:48 ` [PATCH 01/23] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-03-23  0:48 ` [PATCH 02/23] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-03-23  2:34   ` Miaohe Lin
2021-03-23 15:40     ` Peter Xu
2021-03-23  0:48 ` [PATCH 03/23] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-03-23  0:48 ` [PATCH 04/23] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-03-23  0:48 ` [PATCH 05/23] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-03-23  0:48 ` [PATCH 06/23] mm: Drop first_index/last_index in zap_details Peter Xu
2021-03-23  0:48 ` [PATCH 07/23] mm: Introduce zap_details.zap_flags Peter Xu
2021-03-23  2:11   ` Matthew Wilcox
2021-03-23 15:43     ` Peter Xu
2021-03-23  0:48 ` [PATCH 08/23] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-03-23  0:48 ` [PATCH 09/23] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-03-23  0:48 ` [PATCH 10/23] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-03-23  0:49 ` [PATCH 11/23] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-03-23  0:49 ` [PATCH 12/23] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-03-23  0:49 ` [PATCH 13/23] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-03-23  0:49 ` [PATCH 14/23] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-03-23  0:49 ` [PATCH 15/23] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-04-21 22:02   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 16/23] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-04-21 23:06   ` Mike Kravetz
2021-04-22  1:14     ` Peter Xu [this message]
2021-03-23  0:49 ` [PATCH 17/23] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-04-22 18:22   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 18/23] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-04-22 19:00   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 19/23] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-04-22 22:45   ` Mike Kravetz
2021-04-26  2:08     ` Peter Xu
2021-03-23  0:50 ` [PATCH 20/23] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-04-23  0:08   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 21/23] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-04-23 20:33   ` Mike Kravetz
2021-04-26 21:16     ` Peter Xu
2021-04-26 21:36       ` Mike Kravetz
2021-04-26 22:05         ` Peter Xu
2021-04-26 23:09           ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 22/23] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-03-23  0:50 ` [PATCH 23/23] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-03-23  0:54 ` [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-04-21 16:03 ` Peter Xu
2021-04-21 21:39   ` Mike Kravetz
2021-04-22  1:16     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210422011419.GB6404@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.