linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Jason Gunthorpe <jgg@ziepe.ca>, Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [PATCH v3 11/27] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
Date: Thu, 8 Jul 2021 12:49:23 +1000	[thread overview]
Message-ID: <2500158.4Z4izgLEvx@nvdebian> (raw)
In-Reply-To: <YOR4NmRmk54ULkkp@t490s>

On Wednesday, 7 July 2021 1:35:18 AM AEST Peter Xu wrote:
> On Tue, Jul 06, 2021 at 03:40:42PM +1000, Alistair Popple wrote:
> > > > > > > > >  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > > >  			     pte_t pte);
> > > > > > > > >  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> > > > > > > > > index 355ea1ee32bd..c29a6ef3a642 100644
> > > > > > > > > --- a/include/linux/mm_inline.h
> > > > > > > > > +++ b/include/linux/mm_inline.h
> > > > > > > > > @@ -4,6 +4,8 @@
> > > > > > > > >  
> > > > > > > > >  #include <linux/huge_mm.h>
> > > > > > > > >  #include <linux/swap.h>
> > > > > > > > > +#include <linux/userfaultfd_k.h>
> > > > > > > > > +#include <linux/swapops.h>
> > > > > > > > >  
> > > > > > > > >  /**
> > > > > > > > >   * page_is_file_lru - should the page be on a file LRU or anon LRU?
> > > > > > > > > @@ -104,4 +106,45 @@ static __always_inline void del_page_from_lru_list(struct page *page,
> > > > > > > > >  	update_lru_size(lruvec, page_lru(page), page_zonenum(page),
> > > > > > > > >  			-thp_nr_pages(page));
> > > > > > > > >  }
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to
> > > > > > > > > + * replace a none pte.  NOTE!  This should only be called when *pte is already
> > > > > > > > > + * cleared so we will never accidentally replace something valuable.  Meanwhile
> > > > > > > > > + * none pte also means we are not demoting the pte so if tlb flushed then we
> > > > > > > > > + * don't need to do it again; otherwise if tlb flush is postponed then it's
> > > > > > > > > + * even better.
> > > > > > > > > + *
> > > > > > > > > + * Must be called with pgtable lock held.
> > > > > > > > > + */
> > > > > > > > > +static inline void
> > > > > > > > > +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > > > +			      pte_t *pte, pte_t pteval)
> > > > > > > > > +{
> > > > > > > > > +#ifdef CONFIG_USERFAULTFD
> > > > > > > > > +	bool arm_uffd_pte = false;
> > > > > > > > > +
> > > > > > > > > +	/* The current status of the pte should be "cleared" before calling */
> > > > > > > > > +	WARN_ON_ONCE(!pte_none(*pte));
> > > > > > > > > +
> > > > > > > > > +	if (vma_is_anonymous(vma))
> > > > > > > > > +		return;
> > > > > > > > > +
> > > > > > > > > +	/* A uffd-wp wr-protected normal pte */
> > > > > > > > > +	if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval)))
> > > > > > > > > +		arm_uffd_pte = true;
> > > > > > > > > +
> > > > > > > > > +	/*
> > > > > > > > > +	 * A uffd-wp wr-protected swap pte.  Note: this should even work for
> > > > > > > > > +	 * pte_swp_uffd_wp_special() too.
> > > > > > > > > +	 */
> > > > > > > > 
> > > > > > > > I'm probably missing something but when can we actually have this case and why
> > > > > > > > would we want to leave a special pte behind? From what I can tell this is
> > > > > > > > called from try_to_unmap_one() where this won't be true or from zap_pte_range()
> > > > > > > > when not skipping swap pages.
> > > > > > > 
> > > > > > > Yes this is a good question..
> > > > > > > 
> > > > > > > Initially I made this function make sure I cover all forms of uffd-wp bit, that
> > > > > > > contains both swap and present ptes; imho that's pretty safe.  However for
> > > > > > > !anonymous cases we don't keep swap entry inside pte even if swapped out, as
> > > > > > > they should reside in shmem page cache indeed.  The only missing piece seems to
> > > > > > > be the device private entries as you also spotted below.
> > > > > > 
> > > > > > Yes, I think it's *probably* safe although I don't yet have a strong opinion
> > > > > > here ...
> > > > > > 
> > > > > > > > > +	if (unlikely(is_swap_pte(pteval) && pte_swp_uffd_wp(pteval)))
> > > > > > 
> > > > > > ... however if this can never happen would a WARN_ON() be better? It would also
> > > > > > mean you could remove arm_uffd_pte.
> > > > > 
> > > > > Hmm, after a second thought I think we can't make it a WARN_ON_ONCE().. this
> > > > > can still be useful for private mapping of shmem files: in that case we'll have
> > > > > swap entry stored in pte not page cache, so after page reclaim it will contain
> > > > > a valid swap entry, while it's still "!anonymous".
> 
> [1]
> 
> > > > 
> > > > There's something (probably obvious) I must still be missing here. During
> > > > reclaim won't a private shmem mapping still have a present pteval here?
> > > > Therefore it won't trigger this case - the uffd wp bit is set when the swap
> > > > entry is established further down in try_to_unmap_one() right?
> > > 
> > > I agree if it's at the point when it get reclaimed, however what if we zap a
> > > pte of a page already got reclaimed?  It should have the swap pte installed,
> > > imho, which will have "is_swap_pte(pteval) && pte_swp_uffd_wp(pteval)"==true.
> > 
> > Apologies for the delay getting back to this, I hope to find some more time
> > to look at this again this week.
> 
> No problem, please take your time on reviewing the series.
> 
> > 
> > I guess what I am missing is why we care about a swap pte for a reclaimed page
> > getting zapped. I thought that would imply the mapping was getting torn down,
> > although I suppose in that case you still want the uffd-wp to apply in case a
> > new mapping appears there?
> 
> For the torn down case it'll always have ZAP_FLAG_DROP_FILE_UFFD_WP set, so
> pte_install_uffd_wp_if_needed() won't be called, as zap_drop_file_uffd_wp()
> will return true:

Argh, thanks. I had forgotten that bit.

> static inline void
> zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
> 			      unsigned long addr, pte_t *pte,
> 			      struct zap_details *details, pte_t pteval)
> {
> 	if (zap_drop_file_uffd_wp(details))
> 		return;
> 
> 	pte_install_uffd_wp_if_needed(vma, addr, pte, pteval);
> }
> 
> If you see it's non-trivial to fully digest all the caller stacks of it. What I
> wanted to do with pte_install_uffd_wp_if_needed is simply to provide a helper
> that can convert any form of uffd-wp ptes into a pte marker before being set as
> none pte.  Since uffd-wp can exist in two forms (either present, or swap), then
> cover all these two forms (and for swap form also cover the uffd-wp special pte
> itself) is very clear idea and easy to understand to me.  I don't even need to
> worry about who is calling it, and which case can be swap pte, which case must
> not - we just call it when we want to persist the uffd-wp bit (after a pte got
> cleared).  That's why in all cases I still prefer to keep it as is, as it just
> makes things straightforward to me.

Ok, that makes sense. I don't think there is an actual problem here it was
just a little surprising to me so I was trying to get a better understanding
of the caller stacks and when this might actually be required. As you say
though that is non-trivial and in any case it's still ok to install these
bits and a single function is simpler.

 - Alistair
 
> Thanks,
> 
> 





  reply	other threads:[~2021-07-08  2:49 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27 20:19 [PATCH v3 00/27] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-05-27 20:19 ` [PATCH v3 01/27] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-05-27 20:19 ` [PATCH v3 02/27] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-05-27 20:19 ` [PATCH v3 03/27] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-05-27 20:19 ` [PATCH v3 04/27] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-05-28  8:32   ` Alistair Popple
2021-05-28 12:56     ` Peter Xu
2021-06-03 11:53       ` Alistair Popple
2021-06-03 14:51         ` Peter Xu
2021-06-04  0:55           ` Alistair Popple
2021-06-04  3:14             ` Hugh Dickins
2021-06-04  6:16               ` Alistair Popple
2021-06-04 16:01                 ` Peter Xu
2021-06-08 13:18                   ` Alistair Popple
2021-06-09 13:06   ` Alistair Popple
2021-06-09 14:43     ` Peter Xu
2021-05-27 20:21 ` [PATCH v3 05/27] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-05-27 20:21 ` [PATCH v3 06/27] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-06-17  8:59   ` Alistair Popple
2021-06-17 15:10     ` Peter Xu
2021-05-27 20:21 ` [PATCH v3 07/27] mm: Drop first_index/last_index in zap_details Peter Xu
2021-06-21 12:20   ` Alistair Popple
2021-05-27 20:21 ` [PATCH v3 08/27] mm: Introduce zap_details.zap_flags Peter Xu
2021-06-21 12:09   ` Alistair Popple
2021-06-21 16:16     ` Peter Xu
2021-06-22  2:07       ` Alistair Popple
2021-05-27 20:21 ` [PATCH v3 09/27] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-06-21 12:36   ` Alistair Popple
2021-06-21 16:26     ` Peter Xu
2021-06-22  2:11       ` Alistair Popple
2021-05-27 20:21 ` [PATCH v3 10/27] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-05-27 20:22 ` [PATCH v3 11/27] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-06-21  8:41   ` Alistair Popple
2021-06-22  0:40     ` Peter Xu
2021-06-22 12:47       ` Alistair Popple
2021-06-22 15:44         ` Peter Xu
2021-06-23  6:04           ` Alistair Popple
2021-06-23 15:31             ` Peter Xu
2021-07-06  5:40               ` Alistair Popple
2021-07-06 15:35                 ` Peter Xu
2021-07-08  2:49                   ` Alistair Popple [this message]
2021-05-27 20:22 ` [PATCH v3 12/27] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-05-27 20:22 ` [PATCH v3 13/27] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-05-27 20:22 ` [PATCH v3 14/27] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-05-27 20:22 ` [PATCH v3 15/27] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-05-27 20:23 ` [PATCH v3 16/27] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-05-27 20:23 ` [PATCH v3 17/27] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-05-27 20:23 ` [PATCH v3 18/27] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-05-27 20:23 ` [PATCH v3 19/27] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-05-27 20:23 ` [PATCH v3 20/27] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-05-27 20:23 ` [PATCH v3 21/27] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-05-27 20:23 ` [PATCH v3 22/27] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-05-27 20:23 ` [PATCH v3 23/27] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-05-27 20:23 ` [PATCH v3 24/27] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-05-27 20:23 ` [PATCH v3 25/27] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-05-27 20:23 ` [PATCH v3 26/27] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-05-27 20:23 ` [PATCH v3 27/27] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-06-02 14:40 ` [PATCH v3 00/27] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-06-02 22:36   ` Andrew Morton
2021-06-03  0:09     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2500158.4Z4izgLEvx@nvdebian \
    --to=apopple@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).