All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 21/23] hugetlb/userfaultfd: Only drop uffd-wp special pte if required
Date: Mon, 26 Apr 2021 17:16:53 -0400	[thread overview]
Message-ID: <20210426211653.GH85002@xz-x1> (raw)
In-Reply-To: <02712955-0552-f82a-0ab8-460d63af3519@oracle.com>

On Fri, Apr 23, 2021 at 01:33:08PM -0700, Mike Kravetz wrote:
> On 3/22/21 5:50 PM, Peter Xu wrote:
> > Just like what we've done with shmem uffd-wp special ptes, we shouldn't drop
> > uffd-wp special swap pte for hugetlb too, only if we're going to unmap the
> > whole vma, or we're punching a hole with safe locks held.
> > 
> > For example, remove_inode_hugepages() is safe to drop uffd-wp ptes, because it
> > has taken hugetlb fault mutex so that no concurrent page fault would trigger.
> > While the call to hugetlb_vmdelete_list() in hugetlbfs_punch_hole() is not
> > safe.  That's why the previous call will be with ZAP_FLAG_DROP_FILE_UFFD_WP,
> > while the latter one won't be able to.
> 
> This commit message is a bit confusing, but the hugetlb hole punch code
> path is a bit confusing. :)   How about something like this?
> 
> As with  shmem uffd-wp special ptes, only drop the uffd-wp special swap
> pte if unmapping an entire vma or synchronized such that faults can not
> race with the unmap operation.  This requires passing zap_flags all the
> way to the lowest level hugetlb unmap routine: __unmap_hugepage_range.
> 
> In general, unmap calls originated in hugetlbfs code will pass the
> ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent
> faults.  The exception is hole punch which will first unmap without any
> synchronization.  Later when hole punch actually removes the page from
> the file, it will check to see if there was a subsequent fault and if
> so take the hugetlb fault mutex while unmapping again.  This second
> unmap will pass in ZAP_FLAG_DROP_FILE_UFFD_WP.

Sure, I can replace mine.

Maybe it's because I didn't explain enough on the reasoning so it's confusing.
The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP flag
when unmap a hugetlb range" is (IMHO): we should never reach a state when a
page fault could errornously fault in a page-cache page that was wr-protected
to be writable, even in an extremely short period.  That could happen if
e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when calling
hugetlb_vmdelete_list(), because if a page fault triggers after that call and
before the remove_inode_hugepages() right after it, the page cache can be
mapped writable again in the small window, which can cause data corruption.

> 
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  fs/hugetlbfs/inode.c    | 15 +++++++++------
> >  include/linux/hugetlb.h | 13 ++++++++-----
> >  mm/hugetlb.c            | 27 +++++++++++++++++++++------
> >  mm/memory.c             |  5 ++++-
> >  4 files changed, 42 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index d81f52b87bd7..5fe19e801a2b 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -399,7 +399,8 @@ static void remove_huge_page(struct page *page)
> >  }
> >  
> >  static void
> > -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
> > +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
> > +		      unsigned long zap_flags)
> >  {
> >  	struct vm_area_struct *vma;
> >  
> > @@ -432,7 +433,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
> >  		}
> >  
> >  		unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
> > -									NULL);
> > +				     NULL, zap_flags);
> >  	}
> >  }
> >  
> > @@ -513,7 +514,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
> >  				mutex_lock(&hugetlb_fault_mutex_table[hash]);
> >  				hugetlb_vmdelete_list(&mapping->i_mmap,
> >  					index * pages_per_huge_page(h),
> > -					(index + 1) * pages_per_huge_page(h));
> > +					(index + 1) * pages_per_huge_page(h),
> > +					ZAP_FLAG_DROP_FILE_UFFD_WP);
> >  				i_mmap_unlock_write(mapping);
> >  			}
> >  
> > @@ -579,7 +581,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset)
> >  	i_mmap_lock_write(mapping);
> >  	i_size_write(inode, offset);
> >  	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
> > -		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
> > +		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0,
> > +				      ZAP_FLAG_DROP_FILE_UFFD_WP);
> >  	i_mmap_unlock_write(mapping);
> >  	remove_inode_hugepages(inode, offset, LLONG_MAX);
> >  }
> > @@ -612,8 +615,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> >  		i_mmap_lock_write(mapping);
> >  		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
> >  			hugetlb_vmdelete_list(&mapping->i_mmap,
> > -						hole_start >> PAGE_SHIFT,
> > -						hole_end  >> PAGE_SHIFT);
> > +					      hole_start >> PAGE_SHIFT,
> > +					      hole_end >> PAGE_SHIFT, 0);
> >  		i_mmap_unlock_write(mapping);
> >  		remove_inode_hugepages(inode, hole_start, hole_end);
> >  		inode_unlock(inode);
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index 92710600596e..4047fa042782 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -121,14 +121,15 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
> >  			 unsigned long *, unsigned long *, long, unsigned int,
> >  			 int *);
> >  void unmap_hugepage_range(struct vm_area_struct *,
> > -			  unsigned long, unsigned long, struct page *);
> > +			  unsigned long, unsigned long, struct page *,
> > +			  unsigned long);
> >  void __unmap_hugepage_range_final(struct mmu_gather *tlb,
> >  			  struct vm_area_struct *vma,
> >  			  unsigned long start, unsigned long end,
> > -			  struct page *ref_page);
> > +			  struct page *ref_page, unsigned long zap_flags);
> >  void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >  				unsigned long start, unsigned long end,
> > -				struct page *ref_page);
> > +				struct page *ref_page, unsigned long zap_flags);
> 
> Nothing introduced with your patch, but it seems __unmap_hugepage_range_final
> does not need to be in the header and can be static in hugetlb.c.

It seems to be used in unmap_single_vma() of mm/memory.c?

> 
> Everything else looks good,
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Please let me know whether you want my extra paragraph in the commit message,
then I'll coordinate accordingly with the R-b.  Thanks!

-- 
Peter Xu


  reply	other threads:[~2021-04-26 21:17 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23  0:48 [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-03-23  0:48 ` [PATCH 01/23] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-03-23  0:48 ` [PATCH 02/23] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-03-23  2:34   ` Miaohe Lin
2021-03-23 15:40     ` Peter Xu
2021-03-23  0:48 ` [PATCH 03/23] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-03-23  0:48 ` [PATCH 04/23] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-03-23  0:48 ` [PATCH 05/23] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-03-23  0:48 ` [PATCH 06/23] mm: Drop first_index/last_index in zap_details Peter Xu
2021-03-23  0:48 ` [PATCH 07/23] mm: Introduce zap_details.zap_flags Peter Xu
2021-03-23  2:11   ` Matthew Wilcox
2021-03-23 15:43     ` Peter Xu
2021-03-23  0:48 ` [PATCH 08/23] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-03-23  0:48 ` [PATCH 09/23] mm: Pass zap_flags into unmap_mapping_pages() Peter Xu
2021-03-23  0:48 ` [PATCH 10/23] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-03-23  0:49 ` [PATCH 11/23] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-03-23  0:49 ` [PATCH 12/23] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-03-23  0:49 ` [PATCH 13/23] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-03-23  0:49 ` [PATCH 14/23] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-03-23  0:49 ` [PATCH 15/23] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-04-21 22:02   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 16/23] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-04-21 23:06   ` Mike Kravetz
2021-04-22  1:14     ` Peter Xu
2021-03-23  0:49 ` [PATCH 17/23] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-04-22 18:22   ` Mike Kravetz
2021-03-23  0:49 ` [PATCH 18/23] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-04-22 19:00   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 19/23] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-04-22 22:45   ` Mike Kravetz
2021-04-26  2:08     ` Peter Xu
2021-03-23  0:50 ` [PATCH 20/23] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-04-23  0:08   ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 21/23] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-04-23 20:33   ` Mike Kravetz
2021-04-26 21:16     ` Peter Xu [this message]
2021-04-26 21:36       ` Mike Kravetz
2021-04-26 22:05         ` Peter Xu
2021-04-26 23:09           ` Mike Kravetz
2021-03-23  0:50 ` [PATCH 22/23] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-03-23  0:50 ` [PATCH 23/23] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2021-03-23  0:54 ` [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-04-21 16:03 ` Peter Xu
2021-04-21 21:39   ` Mike Kravetz
2021-04-22  1:16     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426211653.GH85002@xz-x1 \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.