linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs
@ 2021-07-14 22:20 Peter Xu
  2021-07-14 22:20 ` [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
                   ` (25 more replies)
  0 siblings, 26 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

This is v4 of uffd-wp shmem & hugetlbfs support, which completes uffd-wp as a
full feature.  It's based on v5.14-rc1.

The whole series can also be found online [1].  Nothing big really changed from
previous version.

One thing worth mentioning is in commit 22061a1ffabd we added a new single_page
parameter to zap_details from Hugh, then a few zap related patches need some
rebase around it.  There's also the new unmap_mapping_page() introduced, in
this series it will start to run with zap flag ZAP_FLAG_DROP_FILE_UFFD_WP, as
this function is used as the last phase to unmap shmem mappings when page being
e.g. truncated.  It actually even simplified a bit as I can drop the patch "mm:
Pass zap_flags into unmap_mapping_pages()" now.

Another thing to mention is I further modified (trivially) the test program
umap-apps [4] to allow backend storage to be run on a shmem file (it used to be
a e.g. XFS file, however as I noticed disk latency is a major bottleneck of
umapsort program especially when it's a HDD, shmem backend is much faster).
This should stress the kernel a bit more than before.

Full changelog listed below.

v4:
- Rebased to v5.14-rc1
- Collect r-b for Alistair
- Patch "mm/userfaultfd: Introduce special pte for unmapped file-backed mem"
  - make pte_swp_uffd_wp_special() return false for !HAVE_ARCH_USERFAULTFD_WP
    [Alistair]
- Patch "mm: Introduce zap_details.zap_flags"
  - Rename zap_check_mapping_skip to zap_skip_check_mapping [Alistair]
- Patch "mm: Pass zap_flags into unmap_mapping_pages()"
  - Dropped the patch because after commit 22061a1ffabd it's not needed anymore.
- Patch "mm/userfaultfd: Enable write protection for shmem & hugetlbfs"
  - Drop UFFD_FEATURE_WP_HUGETLBFS_SHMEM too if !CONFIG_HAVE_ARCH_USERFAULTFD_WP
- Patch "shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed":
  - Convert zap_pte_range() of device private to WARN_ON_ONCE() because it must
    not be a file-backed pte [Alistair]
  - Coordinate with new commit 22061a1ffabd ("mm/thp: unmap_mapping_page() to
    fix THP truncate_cleanup_page()"), add ZAP_FLAG_CHECK_MAPPING for new
    function unmap_mapping_page().

v3:
- Rebase to v5.13-rc3-mmots-2021-05-25-20-12
- Fix commit message and comment for patch "shmem/userfaultfd: Handle uffd-wp
  special pte in page fault handler", dropping all reference to FAULT_FLAG_UFFD_WP.
- Reworked patch "shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP" after
  Axel's refactoring on uffdio-copy/continue.
- Added patch "mm/hugetlb: Introduce huge pte version of uffd-wp helpers", so
  that huge pte helpers are introduced in one patch.  Also add huge_pte_uffd_wp
  helper, which was missing previously.
- Added patch: "mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs", to let
  pagemap uffd-wp bit work for shmem/hugetlbfs
- Added patch: "mm/shmem: Unconditionally set pte dirty in
  mfill_atomic_install_pte", to clean up dirty bit together in uffdio-copy

v2:
- Add R-bs
- Added patch "mm/hugetlb: Drop __unmap_hugepage_range definition from
  hugetlb.h" as noticed/suggested by Mike Kravets
- Fix commit message of patch "hugetlb/userfaultfd: Only drop uffd-wp special
  pte if required" [MikeK]
- Removing comments for fields in zap_details since they're either incorrect or
  not helping [Matthew]
- Rephrase commit message in patch "hugetlb/userfaultfd: Take care of
  UFFDIO_COPY_MODE_WP" to explain better on why set dirty bit for UFFDIO_COPY
  in hugetlbfs [MikeK]
- Don't emulate READ for uffd-wp-special on both shmem & hugetlbfs.
- Drop FAULT_FLAG_UFFD_WP flag, by checking vmf->orig_pte directly against
  pte_swp_uffd_wp_special()
- Fix race condition of page fault handling on uffd-wp-special [Mike]

About Swap Special PTE
======================

In short, the so-called "swap special pte" in this patchset is a new type of
pte that doesn't exist in the past, but it got used initially in this series in
file-backed memories.  It is used to persist information even if the ptes got
dropped meanwhile when the page cache still existed.  For example, when
splitting a file-backed huge pmd, we could be simply dropping the pmd entry
then wait until another fault coming.  It's okay in the past since all
information in the pte can be retained from the page cache when the next page
fault triggers.  However in this case, uffd-wp is per-pte information which
cannot be kept in page cache, so that information needs to be maintained
somehow still in the pgtable entry, even if the pgtable entry is going to be
dropped.  Here instead of replacing with a none entry, we used the "swap
special pte".  Then when the next page fault triggers, we can observe orig_pte
to retain this information.

I'm copy-pasting some commit message from the patch "mm/swap: Introduce the
idea of special swap ptes", where it tried to explain this pte in another angle:

    We used to have special swap entries, like migration entries, hw-poison
    entries, device private entries, etc.

    Those "special swap entries" reside in the range that they need to be at least
    swap entries first, and their types are decided by swp_type(entry).

    This patch introduces another idea called "special swap ptes".

    It's very easy to get confused against "special swap entries", but a speical
    swap pte should never contain a swap entry at all.  It means, it's illegal to
    call pte_to_swp_entry() upon a special swap pte.

    Make the uffd-wp special pte to be the first special swap pte.

    Before this patch, is_swap_pte()==true means one of the below:

       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.

       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.

    After this patch, is_swap_pte()==true means one of the below, where case (b) is
    added:

     (a) The pte contains a swap entry.

       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.

       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.

     (b) The pte does not contain a swap entry at all (so it cannot be passed
         into pte_to_swp_entry()).  For example, uffd-wp special swap pte.

Hugetlbfs needs similar thing because it's also file-backed.  I directly reused
the same special pte there, though the shmem/hugetlb change on supporting this
new pte is different since they don't share code path a lot.

Patch layout
============

Part (1): Shmem support, this is where the special swap pte is introduced.
Some zap rework is needed within the process:

  mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
  shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  mm: Clear vmf->pte after pte_unmap_same() returns
  mm/userfaultfd: Introduce special pte for unmapped file-backed mem
  mm/swap: Introduce the idea of special swap ptes
  shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  mm: Drop first_index/last_index in zap_details
  mm: Introduce zap_details.zap_flags
  mm: Introduce ZAP_FLAG_SKIP_SWAP
  shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
  shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
  shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps
  shmem/userfaultfd: Handle the left-overed special swap ptes
  shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()

Part (2): Hugetlb supportdisable huge pmd sharing for uffd-wp patches have been
merged.  The rest is the changes required to teach hugetlbfs understand the
special swap pte too that introduced with the uffd-wp change:

  mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  mm/hugetlb: Introduce huge pte version of uffd-wp helpers
  hugetlb/userfaultfd: Hook page faults for uffd write protection
  hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
  mm/hugetlb: Introduce huge version of special swap pte helpers
  hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
  hugetlb/userfaultfd: Allow wr-protect none ptes
  hugetlb/userfaultfd: Only drop uffd-wp special pte if required

Part (3): Enable both features in code and test (plus pagemap support)

  mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
  userfaultfd: Enable write protection for shmem & hugetlbfs
  userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs

Tests
=====

I've tested it using either userfaultfd kselftest program, but also with
umapsort [2] which should be even stricter.  Tested page swapping in/out during
umapsort.

If anyone would like to try umapsort, need to use an extremely hacked version
of umap library [3], because by default umap only supports anonymous.  So to
test it we need to build [3] then [2].

Any comment would be greatly welcomed.  Thanks,

[1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs
[2] https://github.com/xzpeter/umap-apps/tree/peter
[3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs
[4] https://github.com/xzpeter/umap-apps/commit/b0c2c7b1cd9dcb6835e7c59d02ece1f6b7f1ea01

Peter Xu (26):
  mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
  shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  mm: Clear vmf->pte after pte_unmap_same() returns
  mm/userfaultfd: Introduce special pte for unmapped file-backed mem
  mm/swap: Introduce the idea of special swap ptes
  shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  mm: Drop first_index/last_index in zap_details
  mm: Introduce zap_details.zap_flags
  mm: Introduce ZAP_FLAG_SKIP_SWAP
  shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
  shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
  shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on
    thps
  shmem/userfaultfd: Handle the left-overed special swap ptes
  shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()
  mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  mm/hugetlb: Introduce huge pte version of uffd-wp helpers
  hugetlb/userfaultfd: Hook page faults for uffd write protection
  hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
  mm/hugetlb: Introduce huge version of special swap pte helpers
  hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
  hugetlb/userfaultfd: Allow wr-protect none ptes
  hugetlb/userfaultfd: Only drop uffd-wp special pte if required
  mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
  mm/userfaultfd: Enable write protection for shmem & hugetlbfs
  userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs

 arch/arm64/kernel/mte.c                  |   2 +-
 arch/x86/include/asm/pgtable.h           |  28 +++
 fs/hugetlbfs/inode.c                     |  15 +-
 fs/proc/task_mmu.c                       |  21 +-
 fs/userfaultfd.c                         |  41 ++--
 include/asm-generic/hugetlb.h            |  15 ++
 include/asm-generic/pgtable_uffd.h       |   3 +
 include/linux/hugetlb.h                  |  30 ++-
 include/linux/mm.h                       |  44 +++-
 include/linux/mm_inline.h                |  42 ++++
 include/linux/shmem_fs.h                 |   4 +-
 include/linux/swapops.h                  |  39 +++-
 include/linux/userfaultfd_k.h            |  49 +++++
 include/uapi/linux/userfaultfd.h         |  10 +-
 mm/gup.c                                 |   2 +-
 mm/hmm.c                                 |   2 +-
 mm/hugetlb.c                             | 160 ++++++++++++---
 mm/khugepaged.c                          |  11 +-
 mm/madvise.c                             |   4 +-
 mm/memcontrol.c                          |   2 +-
 mm/memory.c                              | 244 +++++++++++++++++------
 mm/migrate.c                             |   4 +-
 mm/mincore.c                             |   2 +-
 mm/mprotect.c                            |  63 +++++-
 mm/mremap.c                              |   2 +-
 mm/page_vma_mapped.c                     |   6 +-
 mm/rmap.c                                |   8 +
 mm/shmem.c                               |   5 +-
 mm/swapfile.c                            |   2 +-
 mm/userfaultfd.c                         |  73 +++++--
 tools/testing/selftests/vm/userfaultfd.c |   9 +-
 31 files changed, 756 insertions(+), 186 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

It was conditionally done previously, as there's one shmem special case that we
use SetPageDirty() instead.  However that's not necessary and it should be
easier and cleaner to do it unconditionally in mfill_atomic_install_pte().

The most recent discussion about this is here, where Hugh explained the history
of SetPageDirty() and why it's possible that it's not required at all:

https://lore.kernel.org/lkml/alpine.LSU.2.11.2104121657050.1097@eggly.anvils/

Currently mfill_atomic_install_pte() has three callers:

        1. shmem_mfill_atomic_pte
        2. mcopy_atomic_pte
        3. mcontinue_atomic_pte

After the change: case (1) should have its SetPageDirty replaced by the dirty
bit on pte (so we unify them together, finally), case (2) should have no
functional change at all as it has page_in_cache==false, case (3) may add a
dirty bit to the pte.  However since case (3) is UFFDIO_CONTINUE for shmem,
it's merely 100% sure the page is dirty after all, so should not make a real
difference either.

This should make it much easier to follow on which case will set dirty for
uffd, as we'll simply set it all now for all uffd related ioctls.  Meanwhile,
no special handling of SetPageDirty() if there's no need.

Cc: Hugh Dickins <hughd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/shmem.c       | 1 -
 mm/userfaultfd.c | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 70d9ce294bb4..dc9f95b5fb34 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2449,7 +2449,6 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-	SetPageDirty(page);
 	unlock_page(page);
 	return 0;
 out_delete_from_cache:
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0e2132834bc7..b30a3724c701 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -69,10 +69,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	pgoff_t offset, max_off;
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+	_dst_pte = pte_mkdirty(_dst_pte);
 	if (page_in_cache && !vm_shared)
 		writable = false;
-	if (writable || !page_in_cache)
-		_dst_pte = pte_mkdirty(_dst_pte);
 	if (writable) {
 		if (wp_copy)
 			_dst_pte = pte_mkuffd_wp(_dst_pte);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
  2021-07-14 22:20 ` [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

Firstly, pass wp_copy into shmem_mfill_atomic_pte() through the stack.
Then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with
UFFDIO_COPY_MODE_WP, then wp_copy lands mfill_atomic_install_pte() which is
newly introduced very recently.

We need to make sure shmem_mfill_atomic_pte() will always set the dirty bit in
pte even if UFFDIO_COPY_MODE_WP is set.  After the rework of minor fault series
on shmem we need to slightly touch up the logic there, since uffd-wp needs to
be applied even if writable==false previously (e.g., for shmem private mapping).

Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte(), as
mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/shmem_fs.h |  4 ++--
 mm/shmem.c               |  4 ++--
 mm/userfaultfd.c         | 23 +++++++++++++++--------
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 8e775ce517bb..ba6ea8c4d534 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -128,11 +128,11 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 				  struct vm_area_struct *dst_vma,
 				  unsigned long dst_addr,
 				  unsigned long src_addr,
-				  bool zeropage,
+				  bool zeropage, bool wp_copy,
 				  struct page **pagep);
 #else /* !CONFIG_SHMEM */
 #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
-			       src_addr, zeropage, pagep)       ({ BUG(); 0; })
+			       src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; })
 #endif /* CONFIG_SHMEM */
 #endif /* CONFIG_USERFAULTFD */
 
diff --git a/mm/shmem.c b/mm/shmem.c
index dc9f95b5fb34..9608200f8a64 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2368,7 +2368,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 			   struct vm_area_struct *dst_vma,
 			   unsigned long dst_addr,
 			   unsigned long src_addr,
-			   bool zeropage,
+			   bool zeropage, bool wp_copy,
 			   struct page **pagep)
 {
 	struct inode *inode = file_inode(dst_vma->vm_file);
@@ -2439,7 +2439,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 		goto out_release;
 
 	ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
-				       page, true, false);
+				       page, true, wp_copy);
 	if (ret)
 		goto out_delete_from_cache;
 
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index b30a3724c701..2a9c9e6eb876 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -70,14 +70,22 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
 	_dst_pte = pte_mkdirty(_dst_pte);
-	if (page_in_cache && !vm_shared)
+	/* Don't write if uffd-wp wr-protected */
+	if (wp_copy) {
+		_dst_pte = pte_mkuffd_wp(_dst_pte);
 		writable = false;
-	if (writable) {
-		if (wp_copy)
-			_dst_pte = pte_mkuffd_wp(_dst_pte);
-		else
-			_dst_pte = pte_mkwrite(_dst_pte);
 	}
+	/* Don't write if page cache privately mapped */
+	if (page_in_cache && !vm_shared)
+		writable = false;
+	if (writable)
+		_dst_pte = pte_mkwrite(_dst_pte);
+	else
+		/*
+		 * We need this to make sure write bit removed; as mk_pte()
+		 * could return a pte with write bit set.
+		 */
+		_dst_pte = pte_wrprotect(_dst_pte);
 
 	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
 
@@ -467,11 +475,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 			err = mfill_zeropage_pte(dst_mm, dst_pmd,
 						 dst_vma, dst_addr);
 	} else {
-		VM_WARN_ON_ONCE(wp_copy);
 		err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
 					     dst_addr, src_addr,
 					     mode != MCOPY_ATOMIC_NORMAL,
-					     page);
+					     wp_copy, page);
 	}
 
 	return err;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 03/26] mm: Clear vmf->pte after pte_unmap_same() returns
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
  2021-07-14 22:20 ` [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
  2021-07-14 22:20 ` [PATCH v4 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

pte_unmap_same() will always unmap the pte pointer.  After the unmap, vmf->pte
will not be valid any more.  We should clear it.

It was safe only because no one is accessing vmf->pte after pte_unmap_same()
returns, since the only caller of pte_unmap_same() (so far) is do_swap_page(),
where vmf->pte will in most cases be overwritten very soon.

pte_unmap_same() will be used in other places in follow up patches, so that
vmf->pte will not always be re-written.  This patch enables us to call
functions like finish_fault() because that'll conditionally unmap the pte by
checking vmf->pte first.  Or, alloc_set_pte() will make sure to allocate a new
pte even after calling pte_unmap_same().

Since we'll need to modify vmf->pte, directly pass in vmf into pte_unmap_same()
and then we can also avoid the long parameter list.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 747a01d495f2..0e0de08a2cd5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2724,19 +2724,20 @@ EXPORT_SYMBOL_GPL(apply_to_existing_page_range);
  * proceeding (but do_wp_page is only called after already making such a check;
  * and do_anonymous_page can safely check later on).
  */
-static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
-				pte_t *page_table, pte_t orig_pte)
+static inline int pte_unmap_same(struct vm_fault *vmf)
 {
 	int same = 1;
 #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION)
 	if (sizeof(pte_t) > sizeof(unsigned long)) {
-		spinlock_t *ptl = pte_lockptr(mm, pmd);
+		spinlock_t *ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
 		spin_lock(ptl);
-		same = pte_same(*page_table, orig_pte);
+		same = pte_same(*vmf->pte, vmf->orig_pte);
 		spin_unlock(ptl);
 	}
 #endif
-	pte_unmap(page_table);
+	pte_unmap(vmf->pte);
+	/* After unmap of pte, the pointer is invalid now - clear it. */
+	vmf->pte = NULL;
 	return same;
 }
 
@@ -3487,7 +3488,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	vm_fault_t ret = 0;
 	void *shadow = NULL;
 
-	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
+	if (!pte_unmap_same(vmf))
 		goto out;
 
 	entry = pte_to_swp_entry(vmf->orig_pte);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (2 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

This patch introduces a very special swap-like pte for file-backed memories.

Currently it's only defined for x86_64 only, but as long as any arch that can
properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it should
conceptually work too.

We will use this special pte to arm the ptes that got either unmapped or
swapped out for a file-backed region that was previously wr-protected.  This
special pte could trigger a page fault just like swap entries, and as long as
the page fault will satisfy pte_none()==false && pte_present()==false.

Then we can revive the special pte into a normal pte backed by the page cache.

This idea is greatly inspired by Hugh and Andrea in the discussion, which is
referenced in the links below.

The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as the
special pte.  The current solution (as pointed out by Andrea) is slightly
preferred in that we don't even need swp_entry_t knowledge at all in trapping
these accesses.  Meanwhile, we also reuse _PAGE_SWP_UFFD_WP from the anonymous
swp entries.

This patch only introduces the special pte and its operators.  It's not yet
applied to have any functional difference.

Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/
Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/pgtable.h     | 28 ++++++++++++++++++++++++++++
 include/asm-generic/pgtable_uffd.h |  3 +++
 include/linux/userfaultfd_k.h      | 25 +++++++++++++++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..71b1e73d5b26 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1300,6 +1300,34 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
 #endif
 
 #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+
+/*
+ * This is a very special swap-like pte that marks this pte as "wr-protected"
+ * by userfaultfd-wp.  It should only exist for file-backed memory where the
+ * page (previously got wr-protected) has been unmapped or swapped out.
+ *
+ * For anonymous memories, the userfaultfd-wp _PAGE_SWP_UFFD_WP bit is kept
+ * along with a real swp entry instead.
+ *
+ * Let's make some rules for this special pte:
+ *
+ * (1) pte_none()==false, so that it'll not trigger a missing page fault.
+ *
+ * (2) pte_present()==false, so that it's recognized as swap (is_swap_pte).
+ *
+ * (3) pte_swp_uffd_wp()==true, so it can be tested just like a swap pte that
+ *     contains a valid swap entry, so that we can check a swap pte always
+ *     using "is_swap_pte() && pte_swp_uffd_wp()" without caring about whether
+ *     there's one swap entry inside of the pte.
+ *
+ * (4) It should not be a valid swap pte anywhere, so that when we see this pte
+ *     we know it does not contain a swap entry.
+ *
+ * For x86, the simplest special pte which satisfies all of above should be the
+ * pte with only _PAGE_SWP_UFFD_WP bit set (where swp_type==swp_offset==0).
+ */
+#define  UFFD_WP_SWP_PTE_SPECIAL  __pte(_PAGE_SWP_UFFD_WP)
+
 static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
 {
 	return pte_set_flags(pte, _PAGE_SWP_UFFD_WP);
diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h
index 828966d4c281..95e9811ce9d1 100644
--- a/include/asm-generic/pgtable_uffd.h
+++ b/include/asm-generic/pgtable_uffd.h
@@ -2,6 +2,9 @@
 #define _ASM_GENERIC_PGTABLE_UFFD_H
 
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+
+#define  UFFD_WP_SWP_PTE_SPECIAL  __pte(0)
+
 static __always_inline int pte_uffd_wp(pte_t pte)
 {
 	return 0;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 331d2ccf0bcc..bb5a72a2b07a 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -145,6 +145,21 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma,
 extern void userfaultfd_unmap_complete(struct mm_struct *mm,
 				       struct list_head *uf);
 
+static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma)
+{
+	WARN_ON_ONCE(vma_is_anonymous(vma));
+	return UFFD_WP_SWP_PTE_SPECIAL;
+}
+
+static inline bool pte_swp_uffd_wp_special(pte_t pte)
+{
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+	return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL);
+#else
+	return false;
+#endif
+}
+
 #else /* CONFIG_USERFAULTFD */
 
 /* mm helpers */
@@ -234,6 +249,16 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm,
 {
 }
 
+static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma)
+{
+	return __pte(0);
+}
+
+static inline bool pte_swp_uffd_wp_special(pte_t pte)
+{
+	return false;
+}
+
 #endif /* CONFIG_USERFAULTFD */
 
 #endif /* _LINUX_USERFAULTFD_K_H */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 05/26] mm/swap: Introduce the idea of special swap ptes
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (3 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

We used to have special swap entries, like migration entries, hw-poison
entries, device private entries, etc.

Those "special swap entries" reside in the range that they need to be at least
swap entries first, and their types are decided by swp_type(entry).

This patch introduces another idea called "special swap ptes".

It's very easy to get confused against "special swap entries", but a speical
swap pte should never contain a swap entry at all.  It means, it's illegal to
call pte_to_swp_entry() upon a special swap pte.

Make the uffd-wp special pte to be the first special swap pte.

Before this patch, is_swap_pte()==true means one of the below:

   (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
         example, when an anonymous page got swapped out.

   (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
         example, a migration entry, a hw-poison entry, etc.

After this patch, is_swap_pte()==true means one of the below, where case (b) is
added:

 (a) The pte contains a swap entry.

   (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
         example, when an anonymous page got swapped out.

   (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
         example, a migration entry, a hw-poison entry, etc.

 (b) The pte does not contain a swap entry at all (so it cannot be passed
     into pte_to_swp_entry()).  For example, uffd-wp special swap pte.

Teach the whole mm core about this new idea.  It's done by introducing another
helper called pte_has_swap_entry() which stands for case (a.1) and (a.2).
Before this patch, it will be the same as is_swap_pte() because there's no
special swap pte yet.  Now for most of the previous use of is_swap_entry() in
mm core, we'll need to use the new helper pte_has_swap_entry() instead, to make
sure we won't try to parse a swap entry from a swap special pte (which does not
contain a swap entry at all!).  We either handle the swap special pte, or it'll
naturally use the default "else" paths.

Warn properly (e.g., in do_swap_page()) when we see a special swap pte - we
should never call do_swap_page() upon those ptes, but just to bail out early if
it happens.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/arm64/kernel/mte.c |  2 +-
 fs/proc/task_mmu.c      | 14 ++++++++------
 include/linux/swapops.h | 39 ++++++++++++++++++++++++++++++++++++++-
 mm/gup.c                |  2 +-
 mm/hmm.c                |  2 +-
 mm/khugepaged.c         | 11 ++++++++++-
 mm/madvise.c            |  4 ++--
 mm/memcontrol.c         |  2 +-
 mm/memory.c             |  7 +++++++
 mm/migrate.c            |  4 ++--
 mm/mincore.c            |  2 +-
 mm/mprotect.c           |  2 +-
 mm/mremap.c             |  2 +-
 mm/page_vma_mapped.c    |  6 +++---
 mm/swapfile.c           |  2 +-
 15 files changed, 78 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index 69b3fde8759e..841ff639b4b5 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -35,7 +35,7 @@ EXPORT_SYMBOL_GPL(mte_async_mode);
 static void mte_sync_page_tags(struct page *page, pte_t old_pte,
 			       bool check_swap, bool pte_is_tagged)
 {
-	if (check_swap && is_swap_pte(old_pte)) {
+	if (check_swap && pte_has_swap_entry(old_pte)) {
 		swp_entry_t entry = pte_to_swp_entry(old_pte);
 
 		if (!non_swap_entry(entry) && mte_restore_tags(entry, page))
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index eb97468dfe4c..9c5af77b5290 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -498,7 +498,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 
 	if (pte_present(*pte)) {
 		page = vm_normal_page(vma, addr, *pte);
-	} else if (is_swap_pte(*pte)) {
+	} else if (pte_has_swap_entry(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
 		if (!non_swap_entry(swpent)) {
@@ -516,8 +516,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 			}
 		} else if (is_pfn_swap_entry(swpent))
 			page = pfn_swap_entry_to_page(swpent);
-	} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
-							&& pte_none(*pte))) {
+	} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) &&
+			    mss->check_shmem_swap &&
+			    /* Here swap special pte is the same as none pte */
+			    (pte_none(*pte) || is_swap_special_pte(*pte)))) {
 		page = xa_load(&vma->vm_file->f_mapping->i_pages,
 						linear_page_index(vma, addr));
 		if (xa_is_value(page))
@@ -689,7 +691,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 
 	if (pte_present(*pte)) {
 		page = vm_normal_page(vma, addr, *pte);
-	} else if (is_swap_pte(*pte)) {
+	} else if (pte_has_swap_entry(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
 		if (is_pfn_swap_entry(swpent))
@@ -1071,7 +1073,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
 		ptent = pte_wrprotect(old_pte);
 		ptent = pte_clear_soft_dirty(ptent);
 		ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent);
-	} else if (is_swap_pte(ptent)) {
+	} else if (pte_has_swap_entry(ptent)) {
 		ptent = pte_swp_clear_soft_dirty(ptent);
 		set_pte_at(vma->vm_mm, addr, pte, ptent);
 	}
@@ -1374,7 +1376,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			flags |= PM_SOFT_DIRTY;
 		if (pte_uffd_wp(pte))
 			flags |= PM_UFFD_WP;
-	} else if (is_swap_pte(pte)) {
+	} else if (pte_has_swap_entry(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
 			flags |= PM_SOFT_DIRTY;
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d356ab4047f7..7f46dec3be1d 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -5,6 +5,7 @@
 #include <linux/radix-tree.h>
 #include <linux/bug.h>
 #include <linux/mm_types.h>
+#include <linux/userfaultfd_k.h>
 
 #ifdef CONFIG_MMU
 
@@ -62,12 +63,48 @@ static inline pgoff_t swp_offset(swp_entry_t entry)
 	return entry.val & SWP_OFFSET_MASK;
 }
 
-/* check whether a pte points to a swap entry */
+/*
+ * is_swap_pte() returns true for three cases:
+ *
+ * (a) The pte contains a swap entry.
+ *
+ *   (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
+ *         example, when an anonymous page got swapped out.
+ *
+ *   (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
+ *         example, a migration entry, a hw-poison entry, etc.
+ *
+ * (b) The pte does not contain a swap entry at all (so it cannot be passed
+ *     into pte_to_swp_entry()).  For example, uffd-wp special swap pte.
+ */
 static inline int is_swap_pte(pte_t pte)
 {
 	return !pte_none(pte) && !pte_present(pte);
 }
 
+/*
+ * A swap-like special pte should only be used as special marker to trigger a
+ * page fault.  We should treat them similarly as pte_none() in most cases,
+ * except that it may contain some special information that can persist within
+ * the pte.  Currently the only special swap pte is UFFD_WP_SWP_PTE_SPECIAL.
+ *
+ * Note: we should never call pte_to_swp_entry() upon a special swap pte,
+ * Because a swap special pte does not contain a swap entry!
+ */
+static inline bool is_swap_special_pte(pte_t pte)
+{
+	return pte_swp_uffd_wp_special(pte);
+}
+
+/*
+ * Returns true if the pte contains a swap entry.  This includes not only the
+ * normal swp entry case, but also for migration entries, etc.
+ */
+static inline bool pte_has_swap_entry(pte_t pte)
+{
+	return is_swap_pte(pte) && !is_swap_special_pte(pte);
+}
+
 /*
  * Convert the arch-dependent pte representation of a swp_entry_t into an
  * arch-independent swp_entry_t.
diff --git a/mm/gup.c b/mm/gup.c
index 42b8b1fa6521..425c08788921 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -513,7 +513,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
 		 */
 		if (likely(!(flags & FOLL_MIGRATION)))
 			goto no_page;
-		if (pte_none(pte))
+		if (!pte_has_swap_entry(pte))
 			goto no_page;
 		entry = pte_to_swp_entry(pte);
 		if (!is_migration_entry(entry))
diff --git a/mm/hmm.c b/mm/hmm.c
index fad6be2bf072..aba1bf2c6742 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,7 +239,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	pte_t pte = *ptep;
 	uint64_t pfn_req_flags = *hmm_pfn;
 
-	if (pte_none(pte)) {
+	if (pte_none(pte) || is_swap_special_pte(pte)) {
 		required_fault =
 			hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0);
 		if (required_fault)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b0412be08fa2..7376a9b5bfc9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1018,7 +1018,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 
 		vmf.pte = pte_offset_map(pmd, address);
 		vmf.orig_pte = *vmf.pte;
-		if (!is_swap_pte(vmf.orig_pte)) {
+		if (!pte_has_swap_entry(vmf.orig_pte)) {
 			pte_unmap(vmf.pte);
 			continue;
 		}
@@ -1245,6 +1245,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 	     _pte++, _address += PAGE_SIZE) {
 		pte_t pteval = *_pte;
 		if (is_swap_pte(pteval)) {
+			if (is_swap_special_pte(pteval)) {
+				/*
+				 * Reuse SCAN_PTE_UFFD_WP.  If there will be
+				 * new users of is_swap_special_pte(), we'd
+				 * better introduce a new result type.
+				 */
+				result = SCAN_PTE_UFFD_WP;
+				goto out_unmap;
+			}
 			if (++unmapped <= khugepaged_max_ptes_swap) {
 				/*
 				 * Always be strict with uffd-wp
diff --git a/mm/madvise.c b/mm/madvise.c
index 6d3d348b17f4..2a8d2a9fc514 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -204,7 +204,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 		pte = *(orig_pte + ((index - start) / PAGE_SIZE));
 		pte_unmap_unlock(orig_pte, ptl);
 
-		if (pte_present(pte) || pte_none(pte))
+		if (!pte_has_swap_entry(pte))
 			continue;
 		entry = pte_to_swp_entry(pte);
 		if (unlikely(non_swap_entry(entry)))
@@ -596,7 +596,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 	for (; addr != end; pte++, addr += PAGE_SIZE) {
 		ptent = *pte;
 
-		if (pte_none(ptent))
+		if (pte_none(ptent) || is_swap_special_pte(ptent))
 			continue;
 		/*
 		 * If the pte has swp_entry, just clear page table to
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0cb581..4b46c099ad94 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5738,7 +5738,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 
 	if (pte_present(ptent))
 		page = mc_handle_present_pte(vma, addr, ptent);
-	else if (is_swap_pte(ptent))
+	else if (pte_has_swap_entry(ptent))
 		page = mc_handle_swap_pte(vma, ptent, &ent);
 	else if (pte_none(ptent))
 		page = mc_handle_file_pte(vma, addr, ptent, &ent);
diff --git a/mm/memory.c b/mm/memory.c
index 0e0de08a2cd5..998a4f9a3744 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3491,6 +3491,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (!pte_unmap_same(vmf))
 		goto out;
 
+	/*
+	 * We should never call do_swap_page upon a swap special pte; just be
+	 * safe to bail out if it happens.
+	 */
+	if (WARN_ON_ONCE(is_swap_special_pte(vmf->orig_pte)))
+		goto out;
+
 	entry = pte_to_swp_entry(vmf->orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {
diff --git a/mm/migrate.c b/mm/migrate.c
index 23cbd9de030b..b477d0d5f911 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -294,7 +294,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 
 	spin_lock(ptl);
 	pte = *ptep;
-	if (!is_swap_pte(pte))
+	if (!pte_has_swap_entry(pte))
 		goto out;
 
 	entry = pte_to_swp_entry(pte);
@@ -2276,7 +2276,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 
 		pte = *ptep;
 
-		if (pte_none(pte)) {
+		if (pte_none(pte) || is_swap_special_pte(pte)) {
 			if (vma_is_anonymous(vma)) {
 				mpfn = MIGRATE_PFN_MIGRATE;
 				migrate->cpages++;
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..5728c3e6473f 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -121,7 +121,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr != end; ptep++, addr += PAGE_SIZE) {
 		pte_t pte = *ptep;
 
-		if (pte_none(pte))
+		if (pte_none(pte) || is_swap_special_pte(pte))
 			__mincore_unmapped_range(addr, addr + PAGE_SIZE,
 						 vma, vec);
 		else if (pte_present(pte))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 883e2cc85cad..4b743394afbe 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -139,7 +139,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (pte_has_swap_entry(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 			pte_t newpte;
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 5989d3990020..122b279333ee 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -125,7 +125,7 @@ static pte_t move_soft_dirty_pte(pte_t pte)
 #ifdef CONFIG_MEM_SOFT_DIRTY
 	if (pte_present(pte))
 		pte = pte_mksoft_dirty(pte);
-	else if (is_swap_pte(pte))
+	else if (pte_has_swap_entry(pte))
 		pte = pte_swp_mksoft_dirty(pte);
 #endif
 	return pte;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index f7b331081791..ff57b67426af 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -36,7 +36,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
 			 * For more details on device private memory see HMM
 			 * (include/linux/hmm.h or mm/hmm.c).
 			 */
-			if (is_swap_pte(*pvmw->pte)) {
+			if (pte_has_swap_entry(*pvmw->pte)) {
 				swp_entry_t entry;
 
 				/* Handle un-addressable ZONE_DEVICE memory */
@@ -90,7 +90,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
 
 	if (pvmw->flags & PVMW_MIGRATION) {
 		swp_entry_t entry;
-		if (!is_swap_pte(*pvmw->pte))
+		if (!pte_has_swap_entry(*pvmw->pte))
 			return false;
 		entry = pte_to_swp_entry(*pvmw->pte);
 
@@ -99,7 +99,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
 			return false;
 
 		pfn = swp_offset(entry);
-	} else if (is_swap_pte(*pvmw->pte)) {
+	} else if (pte_has_swap_entry(*pvmw->pte)) {
 		swp_entry_t entry;
 
 		/* Handle un-addressable ZONE_DEVICE memory */
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1e07d1c776f2..4993b4454c13 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1951,7 +1951,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	si = swap_info[type];
 	pte = pte_offset_map(pmd, addr);
 	do {
-		if (!is_swap_pte(*pte))
+		if (!pte_has_swap_entry(*pte))
 			continue;
 
 		entry = pte_to_swp_entry(*pte);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (4 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-15  6:20   ` kernel test robot
  2021-07-14 22:20 ` [PATCH v4 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

File-backed memories are prone to unmap/swap so the ptes are always unstable.
This could lead to userfaultfd-wp information got lost when unmapped or swapped
out on such types of memory, for example, shmem.  To keep such an information
persistent, we will start to use the newly introduced swap-like special ptes to
replace a null pte when those ptes were removed.

Prepare this by handling such a special pte first before it is applied in the
general page fault handler.

The handling of this special pte page fault is similar to missing fault, but it
should happen after the pte missing logic since the special pte is designed to
be a swap-like pte.  Meanwhile it should be handled before do_swap_page() so
that the swap core logic won't be confused to see such an illegal swap pte.

This is a slow path of uffd-wp handling, because unmap of wr-protected shmem
ptes should be rare.  So far it should only trigger in two conditions:

  (1) When trying to punch holes in shmem_fallocate(), there will be a
      pre-unmap optimization before evicting the page.  That will create
      unmapped shmem ptes with wr-protected pages covered.

  (2) Swapping out of shmem pages

Because of this, the page fault handling is simplifed too by not sending the
wr-protect message in the 1st page fault, instead the page will be installed
read-only, so the message will be generated until the next write, which will
trigger the do_wp_page() path of general uffd-wp handling.

Disable fault-around for all uffd-wp registered ranges for extra safety, and
clean the code up a bit after we introduced MINOR fault.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/userfaultfd_k.h | 12 +++++
 mm/memory.c                   | 88 +++++++++++++++++++++++++++++++----
 2 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index bb5a72a2b07a..47c74ef635ab 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -94,6 +94,18 @@ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
 	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
 }
 
+/*
+ * Don't do fault around for either WP or MINOR registered uffd range.  For
+ * MINOR registered range, fault around will be a total disaster and ptes can
+ * be installed without notifications; for WP it should mostly be fine as long
+ * as the fault around checks for pte_none() before the installation, however
+ * to be super safe we just forbid it.
+ */
+static inline bool uffd_disable_fault_around(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
+}
+
 static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & VM_UFFD_MISSING;
diff --git a/mm/memory.c b/mm/memory.c
index 998a4f9a3744..ba8033ca6682 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3964,6 +3964,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 {
 	struct vm_area_struct *vma = vmf->vma;
+	bool uffd_wp = pte_swp_uffd_wp_special(vmf->orig_pte);
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	bool prefault = vmf->address != addr;
 	pte_t entry;
@@ -3978,6 +3979,8 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	if (unlikely(uffd_wp))
+		entry = pte_mkuffd_wp(pte_wrprotect(entry));
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
@@ -4045,8 +4048,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
 				      vmf->address, &vmf->ptl);
 	ret = 0;
-	/* Re-check under ptl */
-	if (likely(pte_none(*vmf->pte)))
+
+	/*
+	 * Re-check under ptl.  Note: this will cover both none pte and
+	 * uffd-wp-special swap pte
+	 */
+	if (likely(pte_same(*vmf->pte, vmf->orig_pte)))
 		do_set_pte(vmf, page, vmf->address);
 	else
 		ret = VM_FAULT_NOPAGE;
@@ -4150,9 +4157,21 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff);
 }
 
+/* Return true if we should do read fault-around, false otherwise */
+static inline bool should_fault_around(struct vm_fault *vmf)
+{
+	/* No ->map_pages?  No way to fault around... */
+	if (!vmf->vma->vm_ops->map_pages)
+		return false;
+
+	if (uffd_disable_fault_around(vmf->vma))
+		return false;
+
+	return fault_around_bytes >> PAGE_SHIFT > 1;
+}
+
 static vm_fault_t do_read_fault(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = vmf->vma;
 	vm_fault_t ret = 0;
 
 	/*
@@ -4160,12 +4179,10 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 	 * if page by the offset is not ready to be mapped (cold cache or
 	 * something).
 	 */
-	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		if (likely(!userfaultfd_minor(vmf->vma))) {
-			ret = do_fault_around(vmf);
-			if (ret)
-				return ret;
-		}
+	if (should_fault_around(vmf)) {
+		ret = do_fault_around(vmf);
+		if (ret)
+			return ret;
 	}
 
 	ret = __do_fault(vmf);
@@ -4484,6 +4501,57 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
 	return VM_FAULT_FALLBACK;
 }
 
+static vm_fault_t uffd_wp_clear_special(struct vm_fault *vmf)
+{
+	vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
+				       vmf->address, &vmf->ptl);
+	/*
+	 * Be careful so that we will only recover a special uffd-wp pte into a
+	 * none pte.  Otherwise it means the pte could have changed, so retry.
+	 */
+	if (pte_swp_uffd_wp_special(*vmf->pte))
+		pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
+	return 0;
+}
+
+/*
+ * This is actually a page-missing access, but with uffd-wp special pte
+ * installed.  It means this pte was wr-protected before being unmapped.
+ */
+static vm_fault_t uffd_wp_handle_special(struct vm_fault *vmf)
+{
+	/* Careful!  vmf->pte unmapped after return */
+	if (!pte_unmap_same(vmf))
+		return 0;
+
+	/*
+	 * Just in case there're leftover special ptes even after the region
+	 * got unregistered - we can simply clear them.
+	 */
+	if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma)))
+		return uffd_wp_clear_special(vmf);
+
+	/*
+	 * Here we share most code with do_fault(), in which we can identify
+	 * whether this is "none pte fault" or "uffd-wp-special fault" by
+	 * checking the vmf->orig_pte.
+	 */
+	return do_fault(vmf);
+}
+
+static vm_fault_t do_swap_pte(struct vm_fault *vmf)
+{
+	/*
+	 * We need to handle special swap ptes before handling ptes that
+	 * contain swap entries, always.
+	 */
+	if (unlikely(pte_swp_uffd_wp_special(vmf->orig_pte)))
+		return uffd_wp_handle_special(vmf);
+
+	return do_swap_page(vmf);
+}
+
 /*
  * These routines also need to handle stuff like marking pages dirty
  * and/or accessed for architectures that don't do it in hardware (most
@@ -4558,7 +4626,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
 	}
 
 	if (!pte_present(vmf->orig_pte))
-		return do_swap_page(vmf);
+		return do_swap_pte(vmf);
 
 	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
 		return do_numa_page(vmf);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 07/26] mm: Drop first_index/last_index in zap_details
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (5 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:20 ` [PATCH v4 08/26] mm: Introduce zap_details.zap_flags Peter Xu
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

The first_index/last_index parameters in zap_details are actually only used in
unmap_mapping_range_tree().  At the meantime, this function is only called by
unmap_mapping_pages() once.  Instead of passing these two variables through the
whole stack of page zapping code, remove them from zap_details and let them
simply be parameters of unmap_mapping_range_tree(), which is inlined.

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  2 --
 mm/memory.c        | 29 ++++++++++++++++-------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 57453dba41b9..795b3cd643ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1720,8 +1720,6 @@ extern void user_shm_unlock(size_t, struct ucounts *);
  */
 struct zap_details {
 	struct address_space *check_mapping;	/* Check page->mapping if set */
-	pgoff_t	first_index;			/* Lowest page->index to unmap */
-	pgoff_t last_index;			/* Highest page->index to unmap */
 	struct page *single_page;		/* Locked page to be unmapped */
 };
 
diff --git a/mm/memory.c b/mm/memory.c
index ba8033ca6682..4c269d7b3d83 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3322,20 +3322,20 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
 }
 
 static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
+					    pgoff_t first_index,
+					    pgoff_t last_index,
 					    struct zap_details *details)
 {
 	struct vm_area_struct *vma;
 	pgoff_t vba, vea, zba, zea;
 
-	vma_interval_tree_foreach(vma, root,
-			details->first_index, details->last_index) {
-
+	vma_interval_tree_foreach(vma, root, first_index, last_index) {
 		vba = vma->vm_pgoff;
 		vea = vba + vma_pages(vma) - 1;
-		zba = details->first_index;
+		zba = first_index;
 		if (zba < vba)
 			zba = vba;
-		zea = details->last_index;
+		zea = last_index;
 		if (zea > vea)
 			zea = vea;
 
@@ -3361,18 +3361,21 @@ void unmap_mapping_page(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
 	struct zap_details details = { };
+	pgoff_t	first_index, last_index;
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageTail(page));
 
+	first_index = page->index;
+	last_index = page->index + thp_nr_pages(page) - 1;
+
 	details.check_mapping = mapping;
-	details.first_index = page->index;
-	details.last_index = page->index + thp_nr_pages(page) - 1;
 	details.single_page = page;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
-		unmap_mapping_range_tree(&mapping->i_mmap, &details);
+		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+					 last_index, &details);
 	i_mmap_unlock_write(mapping);
 }
 
@@ -3391,17 +3394,17 @@ void unmap_mapping_page(struct page *page)
 void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 		pgoff_t nr, bool even_cows)
 {
+	pgoff_t	first_index = start, last_index = start + nr - 1;
 	struct zap_details details = { };
 
 	details.check_mapping = even_cows ? NULL : mapping;
-	details.first_index = start;
-	details.last_index = start + nr - 1;
-	if (details.last_index < details.first_index)
-		details.last_index = ULONG_MAX;
+	if (last_index < first_index)
+		last_index = ULONG_MAX;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
-		unmap_mapping_range_tree(&mapping->i_mmap, &details);
+		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+					 last_index, &details);
 	i_mmap_unlock_write(mapping);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 08/26] mm: Introduce zap_details.zap_flags
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (6 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
@ 2021-07-14 22:20 ` Peter Xu
  2021-07-14 22:21 ` [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

Instead of trying to introduce one variable for every new zap_details fields,
let's introduce a flag so that it can start to encode true/false informations.

Let's start to use this flag first to clean up the only check_mapping variable.
Firstly, the name "check_mapping" implies this is a "boolean", but actually it
stores the mapping inside, just in a way that it won't be set if we don't want
to check the mapping.

To make things clearer, introduce the 1st zap flag ZAP_FLAG_CHECK_MAPPING, so
that we only check against the mapping if this bit set.  At the same time, we
can rename check_mapping into zap_mapping and set it always.

Since at it, introduce another helper zap_check_mapping_skip() and use it in
zap_pte_range() properly.

Some old comments have been removed in zap_pte_range() because they're
duplicated, and since now we're with ZAP_FLAG_CHECK_MAPPING flag, it'll be very
easy to grep this information by simply grepping the flag.

It'll also make life easier when we want to e.g. pass in zap_flags into the
callers like unmap_mapping_pages() (instead of adding new booleans besides the
even_cows parameter).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h | 19 ++++++++++++++++++-
 mm/memory.c        | 34 ++++++++++------------------------
 2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 795b3cd643ca..26088174daab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1715,14 +1715,31 @@ static inline bool can_do_mlock(void) { return false; }
 extern int user_shm_lock(size_t, struct ucounts *);
 extern void user_shm_unlock(size_t, struct ucounts *);
 
+/* Whether to check page->mapping when zapping */
+#define  ZAP_FLAG_CHECK_MAPPING             BIT(0)
+
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
-	struct address_space *check_mapping;	/* Check page->mapping if set */
+	struct address_space *zap_mapping;
 	struct page *single_page;		/* Locked page to be unmapped */
+	unsigned long zap_flags;
 };
 
+/* Return true if skip zapping this page, false otherwise */
+static inline bool
+zap_skip_check_mapping(struct zap_details *details, struct page *page)
+{
+	if (!details || !page)
+		return false;
+
+	if (!(details->zap_flags & ZAP_FLAG_CHECK_MAPPING))
+		return false;
+
+	return details->zap_mapping != page_rmapping(page);
+}
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 4c269d7b3d83..2a5a6650f069 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1333,16 +1333,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = vm_normal_page(vma, addr, ptent);
-			if (unlikely(details) && page) {
-				/*
-				 * unmap_shared_mapping_pages() wants to
-				 * invalidate cache without truncating:
-				 * unmap shared but keep private pages.
-				 */
-				if (details->check_mapping &&
-				    details->check_mapping != page_rmapping(page))
-					continue;
-			}
+			if (unlikely(zap_skip_check_mapping(details, page)))
+				continue;
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
 							tlb->fullmm);
 			tlb_remove_tlb_entry(tlb, pte, addr);
@@ -1375,17 +1367,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		    is_device_exclusive_entry(entry)) {
 			struct page *page = pfn_swap_entry_to_page(entry);
 
-			if (unlikely(details && details->check_mapping)) {
-				/*
-				 * unmap_shared_mapping_pages() wants to
-				 * invalidate cache without truncating:
-				 * unmap shared but keep private pages.
-				 */
-				if (details->check_mapping !=
-				    page_rmapping(page))
-					continue;
-			}
-
+			if (unlikely(zap_skip_check_mapping(details, page)))
+				continue;
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
 
@@ -3369,8 +3352,9 @@ void unmap_mapping_page(struct page *page)
 	first_index = page->index;
 	last_index = page->index + thp_nr_pages(page) - 1;
 
-	details.check_mapping = mapping;
+	details.zap_mapping = mapping;
 	details.single_page = page;
+	details.zap_flags = ZAP_FLAG_CHECK_MAPPING;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
@@ -3395,9 +3379,11 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 		pgoff_t nr, bool even_cows)
 {
 	pgoff_t	first_index = start, last_index = start + nr - 1;
-	struct zap_details details = { };
+	struct zap_details details = { .zap_mapping = mapping };
+
+	if (!even_cows)
+		details.zap_flags |= ZAP_FLAG_CHECK_MAPPING;
 
-	details.check_mapping = even_cows ? NULL : mapping;
 	if (last_index < first_index)
 		last_index = ULONG_MAX;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (7 preceding siblings ...)
  2021-07-14 22:20 ` [PATCH v4 08/26] mm: Introduce zap_details.zap_flags Peter Xu
@ 2021-07-14 22:21 ` Peter Xu
  2021-07-14 22:21 ` [PATCH v4 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:21 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

Firstly, the comment in zap_pte_range() is misleading because it checks against
details rather than check_mappings, so it's against what the code did.

Meanwhile, it's confusing too on not explaining why passing in the details
pointer would mean to skip all swap entries.  New user of zap_details could
very possibly miss this fact if they don't read deep until zap_pte_range()
because there's no comment at zap_details talking about it at all, so swap
entries could be errornously skipped without being noticed.

This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"),
but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
"details" parameter: the caller should explicitly set this to skip swap
entries, otherwise swap entries will always be considered (which is still the
major case here).

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h | 12 ++++++++++++
 mm/memory.c        |  8 +++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 26088174daab..62a75e4414e3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1717,6 +1717,8 @@ extern void user_shm_unlock(size_t, struct ucounts *);
 
 /* Whether to check page->mapping when zapping */
 #define  ZAP_FLAG_CHECK_MAPPING             BIT(0)
+/* Whether to skip zapping swap entries */
+#define  ZAP_FLAG_SKIP_SWAP                 BIT(1)
 
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
@@ -1740,6 +1742,16 @@ zap_skip_check_mapping(struct zap_details *details, struct page *page)
 	return details->zap_mapping != page_rmapping(page);
 }
 
+/* Return true if skip swap entries, false otherwise */
+static inline bool
+zap_skip_swap(struct zap_details *details)
+{
+	if (!details)
+		return false;
+
+	return details->zap_flags & ZAP_FLAG_SKIP_SWAP;
+}
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 2a5a6650f069..d6b1adbf29e4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1379,8 +1379,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			continue;
 		}
 
-		/* If details->check_mapping, we leave swap entries. */
-		if (unlikely(details))
+		if (unlikely(zap_skip_swap(details)))
 			continue;
 
 		if (!non_swap_entry(entry))
@@ -3379,7 +3378,10 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 		pgoff_t nr, bool even_cows)
 {
 	pgoff_t	first_index = start, last_index = start + nr - 1;
-	struct zap_details details = { .zap_mapping = mapping };
+	struct zap_details details = {
+		.zap_mapping = mapping,
+		.zap_flags = ZAP_FLAG_SKIP_SWAP,
+	};
 
 	if (!even_cows)
 		details.zap_flags |= ZAP_FLAG_CHECK_MAPPING;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (8 preceding siblings ...)
  2021-07-14 22:21 ` [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
@ 2021-07-14 22:21 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:21 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Jason Gunthorpe, peterx, Matthew Wilcox, Andrew Morton,
	Axel Rasmussen, Nadav Amit, Jerome Glisse, Mike Rapoport,
	Miaohe Lin, Hugh Dickins, Alistair Popple, Andrea Arcangeli,
	Mike Kravetz, Kirill A . Shutemov, David Hildenbrand

File-backed memory is prone to being unmapped at any time.  It means all
information in the pte will be dropped, including the uffd-wp flag.

Since the uffd-wp info cannot be stored in page cache or swap cache, persist
this wr-protect information by installing the special uffd-wp marker pte when
we're going to unmap a uffd wr-protected pte.  When the pte is accessed again,
we will know it's previously wr-protected by recognizing the special pte.

Meanwhile add a new flag ZAP_FLAG_DROP_FILE_UFFD_WP when we don't want to
persist such an information.  For example, when destroying the whole vma, or
punching a hole in a shmem file.  For the latter, we can only drop the uffd-wp
bit when holding the page lock.  It means the unmap_mapping_range() in
shmem_fallocate() still reuqires to zap without ZAP_FLAG_DROP_FILE_UFFD_WP
because that's still racy with the page faults.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h        | 11 +++++++++
 include/linux/mm_inline.h | 42 +++++++++++++++++++++++++++++++++
 mm/memory.c               | 49 +++++++++++++++++++++++++++++++++++++--
 mm/rmap.c                 |  8 +++++++
 4 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 62a75e4414e3..8de230fc7b84 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1719,6 +1719,8 @@ extern void user_shm_unlock(size_t, struct ucounts *);
 #define  ZAP_FLAG_CHECK_MAPPING             BIT(0)
 /* Whether to skip zapping swap entries */
 #define  ZAP_FLAG_SKIP_SWAP                 BIT(1)
+/* Whether to completely drop uffd-wp entries for file-backed memory */
+#define  ZAP_FLAG_DROP_FILE_UFFD_WP         BIT(2)
 
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
@@ -1752,6 +1754,15 @@ zap_skip_swap(struct zap_details *details)
 	return details->zap_flags & ZAP_FLAG_SKIP_SWAP;
 }
 
+static inline bool
+zap_drop_file_uffd_wp(struct zap_details *details)
+{
+	if (!details)
+		return false;
+
+	return details->zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP;
+}
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 355ea1ee32bd..95556b4bfe7a 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -4,6 +4,8 @@
 
 #include <linux/huge_mm.h>
 #include <linux/swap.h>
+#include <linux/userfaultfd_k.h>
+#include <linux/swapops.h>
 
 /**
  * page_is_file_lru - should the page be on a file LRU or anon LRU?
@@ -104,4 +106,44 @@ static __always_inline void del_page_from_lru_list(struct page *page,
 	update_lru_size(lruvec, page_lru(page), page_zonenum(page),
 			-thp_nr_pages(page));
 }
+
+/*
+ * If this pte is wr-protected by uffd-wp in any form, arm the special pte to
+ * replace a none pte.  NOTE!  This should only be called when *pte is already
+ * cleared so we will never accidentally replace something valuable.  Meanwhile
+ * none pte also means we are not demoting the pte so tlb flushed is not needed.
+ * E.g., when pte cleared the caller should have taken care of the tlb flush.
+ *
+ * Must be called with pgtable lock held.
+ */
+static inline void
+pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr,
+			      pte_t *pte, pte_t pteval)
+{
+#ifdef CONFIG_USERFAULTFD
+	bool arm_uffd_pte = false;
+
+	/* The current status of the pte should be "cleared" before calling */
+	WARN_ON_ONCE(!pte_none(*pte));
+
+	if (vma_is_anonymous(vma))
+		return;
+
+	/* A uffd-wp wr-protected normal pte */
+	if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval)))
+		arm_uffd_pte = true;
+
+	/*
+	 * A uffd-wp wr-protected swap pte.  Note: this should even work for
+	 * pte_swp_uffd_wp_special() too.
+	 */
+	if (unlikely(is_swap_pte(pteval) && pte_swp_uffd_wp(pteval)))
+		arm_uffd_pte = true;
+
+	if (unlikely(arm_uffd_pte))
+		set_pte_at(vma->vm_mm, addr, pte,
+			   pte_swp_mkuffd_wp_special(vma));
+#endif
+}
+
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index d6b1adbf29e4..223781f115e9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -73,6 +73,7 @@
 #include <linux/perf_event.h>
 #include <linux/ptrace.h>
 #include <linux/vmalloc.h>
+#include <linux/mm_inline.h>
 
 #include <trace/events/kmem.h>
 
@@ -1301,6 +1302,21 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 	return ret;
 }
 
+/*
+ * This function makes sure that we'll replace the none pte with an uffd-wp
+ * swap special pte marker when necessary. Must be with the pgtable lock held.
+ */
+static inline void
+zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
+			      unsigned long addr, pte_t *pte,
+			      struct zap_details *details, pte_t pteval)
+{
+	if (zap_drop_file_uffd_wp(details))
+		return;
+
+	pte_install_uffd_wp_if_needed(vma, addr, pte, pteval);
+}
+
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				struct vm_area_struct *vma, pmd_t *pmd,
 				unsigned long addr, unsigned long end,
@@ -1338,6 +1354,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
 							tlb->fullmm);
 			tlb_remove_tlb_entry(tlb, pte, addr);
+			zap_install_uffd_wp_if_needed(vma, addr, pte, details,
+						      ptent);
 			if (unlikely(!page))
 				continue;
 
@@ -1362,6 +1380,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			continue;
 		}
 
+		/*
+		 * If this is a special uffd-wp marker pte... Drop it only if
+		 * enforced to do so.
+		 */
+		if (unlikely(is_swap_special_pte(ptent))) {
+			WARN_ON_ONCE(!pte_swp_uffd_wp_special(ptent));
+			/*
+			 * If this is a common unmap of ptes, keep this as is.
+			 * Drop it only if this is a whole-vma destruction.
+			 */
+			if (zap_drop_file_uffd_wp(details))
+				ptep_get_and_clear_full(mm, addr, pte,
+							tlb->fullmm);
+			continue;
+		}
+
 		entry = pte_to_swp_entry(ptent);
 		if (is_device_private_entry(entry) ||
 		    is_device_exclusive_entry(entry)) {
@@ -1369,6 +1403,13 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 
 			if (unlikely(zap_skip_check_mapping(details, page)))
 				continue;
+			/*
+			 * Both device private/exclusive mappings should only
+			 * work with anonymous page so far, so we don't need to
+			 * consider uffd-wp bit when zap. For more information,
+			 * see zap_install_uffd_wp_if_needed().
+			 */
+			WARN_ON_ONCE(!vma_is_anonymous(vma));
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
 
@@ -1393,6 +1434,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
 		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+		zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	add_mm_rss_vec(mm, rss);
@@ -1603,12 +1645,15 @@ void unmap_vmas(struct mmu_gather *tlb,
 		unsigned long end_addr)
 {
 	struct mmu_notifier_range range;
+	struct zap_details details = {
+		.zap_flags = ZAP_FLAG_DROP_FILE_UFFD_WP,
+	};
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
 				start_addr, end_addr);
 	mmu_notifier_invalidate_range_start(&range);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
-		unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
+		unmap_single_vma(tlb, vma, start_addr, end_addr, &details);
 	mmu_notifier_invalidate_range_end(&range);
 }
 
@@ -3353,7 +3398,7 @@ void unmap_mapping_page(struct page *page)
 
 	details.zap_mapping = mapping;
 	details.single_page = page;
-	details.zap_flags = ZAP_FLAG_CHECK_MAPPING;
+	details.zap_flags = ZAP_FLAG_CHECK_MAPPING | ZAP_FLAG_DROP_FILE_UFFD_WP;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
diff --git a/mm/rmap.c b/mm/rmap.c
index 795f9d5f8386..92ba81567089 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -72,6 +72,7 @@
 #include <linux/page_idle.h>
 #include <linux/memremap.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/mm_inline.h>
 
 #include <asm/tlbflush.h>
 
@@ -1515,6 +1516,13 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			pteval = ptep_clear_flush(vma, address, pvmw.pte);
 		}
 
+		/*
+		 * Now the pte is cleared.  If this is uffd-wp armed pte, we
+		 * may want to replace a none pte with a marker pte if it's
+		 * file-backed, so we don't lose the tracking information.
+		 */
+		pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
 		/* Move the dirty bit to the page. Now the pte is gone. */
 		if (pte_dirty(pteval))
 			set_page_dirty(page);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (9 preceding siblings ...)
  2021-07-14 22:21 ` [PATCH v4 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

File-backed memory differs from anonymous memory in that even if the pte is
missing, the data could still resides either in the file or in page/swap cache.
So when wr-protect a pte, we need to consider none ptes too.

We do that by installing the uffd-wp special swap pte as a marker.  So when
there's a future write to the pte, the fault handler will go the special path
to first fault-in the page as read-only, then report to userfaultfd server with
the wr-protect message.

On the other hand, when unprotecting a page, it's also possible that the pte
got unmapped but replaced by the special uffd-wp marker.  Then we'll need to be
able to recover from a uffd-wp special swap pte into a none pte, so that the
next access to the page will fault in correctly as usual when trigger the fault
handler next time, rather than sending a uffd-wp message.

Special care needs to be taken throughout the change_protection_range()
process.  Since now we allow user to wr-protect a none pte, we need to be able
to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD_WP
requests, otherwise change_protection_range() will always skip when the pgtable
entry does not exist.

Note that this patch only covers the small pages (pte level) but not covering
any of the transparent huge pages yet.  But this will be a base for thps too.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mprotect.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 4b743394afbe..8ec85b276975 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -29,6 +29,7 @@
 #include <linux/uaccess.h>
 #include <linux/mm_inline.h>
 #include <linux/pgtable.h>
+#include <linux/userfaultfd_k.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/tlbflush.h>
@@ -186,6 +187,32 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 				set_pte_at(vma->vm_mm, addr, pte, newpte);
 				pages++;
 			}
+		} else if (unlikely(is_swap_special_pte(oldpte))) {
+			if (uffd_wp_resolve && !vma_is_anonymous(vma) &&
+			    pte_swp_uffd_wp_special(oldpte)) {
+				/*
+				 * This is uffd-wp special pte and we'd like to
+				 * unprotect it.  What we need to do is simply
+				 * recover the pte into a none pte; the next
+				 * page fault will fault in the page.
+				 */
+				pte_clear(vma->vm_mm, addr, pte);
+				pages++;
+			}
+		} else {
+			/* It must be an none page, or what else?.. */
+			WARN_ON_ONCE(!pte_none(oldpte));
+			if (unlikely(uffd_wp && !vma_is_anonymous(vma))) {
+				/*
+				 * For file-backed mem, we need to be able to
+				 * wr-protect even for a none pte!  Because
+				 * even if the pte is null, the page/swap cache
+				 * could exist.
+				 */
+				set_pte_at(vma->vm_mm, addr, pte,
+					   pte_swp_mkuffd_wp_special(vma));
+				pages++;
+			}
 		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	arch_leave_lazy_mmu_mode();
@@ -219,6 +246,25 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * File-backed vma allows uffd wr-protect upon none ptes, because even if pte
+ * is missing, page/swap cache could exist.  When that happens, the wr-protect
+ * information will be stored in the page table entries with the marker (e.g.,
+ * PTE_SWP_UFFD_WP_SPECIAL).  Prepare for that by always populating the page
+ * tables to pte level, so that we'll install the markers in change_pte_range()
+ * where necessary.
+ *
+ * Note that we only need to do this in pmd level, because if pmd does not
+ * exist, it means the whole range covered by the pmd entry (of a pud) does not
+ * contain any valid data but all zeros.  Then nothing to wr-protect.
+ */
+#define  change_protection_prepare(vma, pmd, addr, cp_flags)		\
+	do {								\
+		if (unlikely((cp_flags & MM_CP_UFFD_WP) && pmd_none(*pmd) && \
+			     !vma_is_anonymous(vma)))			\
+			WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd));	\
+	} while (0)
+
 static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		pud_t *pud, unsigned long addr, unsigned long end,
 		pgprot_t newprot, unsigned long cp_flags)
@@ -237,6 +283,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 
 		next = pmd_addr_end(addr, end);
 
+		change_protection_prepare(vma, pmd, addr, cp_flags);
+
 		/*
 		 * Automatic NUMA balancing walks the tables with mmap_lock
 		 * held for read. It's possible a parallel update to occur
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (10 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

We don't have "huge" version of PTE_SWP_UFFD_WP_SPECIAL, instead when necessary
we split the thp if the huge page is uffd wr-protected previously.

However split the thp is not enough, because file-backed thp is handled totally
differently comparing to anonymous thps - rather than doing a real split, the
thp pmd will simply got dropped in __split_huge_pmd_locked().

That is definitely not enough if e.g. when there is a thp covers range [0, 2M)
but we want to wr-protect small page resides in [4K, 8K) range, because after
__split_huge_pmd() returns, there will be a none pmd.

Here we leverage the previously introduced change_protection_prepare() macro so
that we'll populate the pmd with a pgtable page.  Then change_pte_range() will
do all the rest for us, e.g., install the uffd-wp swap special pte marker at
any pte that we'd like to wr-protect, under the protection of pgtable lock.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/mprotect.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 8ec85b276975..3fcb87b59696 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -306,8 +306,16 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		}
 
 		if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) {
-			if (next - addr != HPAGE_PMD_SIZE) {
+			if (next - addr != HPAGE_PMD_SIZE ||
+			    /* Uffd wr-protecting a file-backed memory range */
+			    unlikely(!vma_is_anonymous(vma) &&
+				     (cp_flags & MM_CP_UFFD_WP))) {
 				__split_huge_pmd(vma, pmd, addr, false, NULL);
+				/*
+				 * For file-backed, the pmd could have been
+				 * gone; still provide a pte pgtable if needed.
+				 */
+				change_protection_prepare(vma, pmd, addr, cp_flags);
 			} else {
 				int nr_ptes = change_huge_pmd(vma, pmd, addr,
 							      newprot, cp_flags);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (11 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Note that the special uffd-wp swap pte can be left over even if the page under
the pte got evicted.  Normally when evict a page, we will unmap the ptes by
walking through the reverse mapping.  However we never tracked such information
for the special swap ptes because they're not real mappings but just markers.
So we need to take care of that when we see a marker but when it's actually
meaningless (the page behind it got evicted).

We have already taken care of that in e.g. alloc_set_pte() where we'll treat
the special swap pte as pte_none() when necessary.  However we need to also
teach userfaultfd itself on either UFFDIO_COPY or handling page faults, so that
everything will still work as expected.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 15 +++++++++++++++
 mm/userfaultfd.c | 13 ++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f6e0f0c0d0e5..e1c1cbc7bcc8 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -329,6 +329,21 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
 	 */
 	if (pte_none(*pte))
 		ret = true;
+	/*
+	 * We also treat the swap special uffd-wp pte as the pte_none() here.
+	 * This should in most cases be a missing event, as we never handle
+	 * wr-protect upon a special uffd-wp swap pte - it should first be
+	 * converted into a normal read request before handling wp.  It just
+	 * means the page/swap cache that backing this pte is gone, so this
+	 * special pte is leftover.
+	 *
+	 * We can't simply replace it with a none pte because we're not with
+	 * the pgtable lock here.  Instead of taking it and clearing the pte,
+	 * the easy way is to let UFFDIO_COPY understand this pte too when
+	 * trying to install a new page onto it.
+	 */
+	if (pte_swp_uffd_wp_special(*pte))
+		ret = true;
 	if (!pte_write(*pte) && (reason & VM_UFFD_WP))
 		ret = true;
 	pte_unmap(pte);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 2a9c9e6eb876..0c7212dfb95d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -100,7 +100,18 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	}
 
 	ret = -EEXIST;
-	if (!pte_none(*dst_pte))
+	/*
+	 * Besides the none pte, we also allow UFFDIO_COPY to install a pte
+	 * onto the uffd-wp swap special pte, because that pte should be the
+	 * same as a pte_none() just in that it contains wr-protect information
+	 * (which could only be dropped when unmap the memory).
+	 *
+	 * It's safe to drop that marker because we know this is part of a
+	 * MISSING fault, and the caller is very clear about this page missing
+	 * rather than wr-protected.  Then we're sure the wr-protect bit is
+	 * just a leftover so it's useless already and is the same as none pte.
+	 */
+	if (!pte_none(*dst_pte) && !pte_swp_uffd_wp_special(*dst_pte))
 		goto out_unlock;
 
 	if (page_in_cache)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (12 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

It should be handled similarly like other uffd-wp wr-protected ptes: we should
pass it over when the dst_vma has VM_UFFD_WP armed, otherwise drop it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 223781f115e9..af91bee934c7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -776,8 +776,21 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	unsigned long vm_flags = dst_vma->vm_flags;
 	pte_t pte = *src_pte;
 	struct page *page;
-	swp_entry_t entry = pte_to_swp_entry(pte);
+	swp_entry_t entry;
+
+	if (unlikely(is_swap_special_pte(pte))) {
+		/*
+		 * uffd-wp special swap pte is the only possibility for now.
+		 * If dst vma is registered with uffd-wp, copy it over.
+		 * Otherwise, ignore this pte as if it's a none pte would work.
+		 */
+		WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+		if (userfaultfd_wp(dst_vma))
+			set_pte_at(dst_mm, addr, dst_pte, pte);
+		return 0;
+	}
 
+	entry = pte_to_swp_entry(pte);
 	if (likely(!non_swap_entry(entry))) {
 		if (swap_duplicate(entry) < 0)
 			return -EIO;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (13 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-15  5:49   ` kernel test robot
                     ` (3 more replies)
  2021-07-14 22:24 ` [PATCH v4 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
                   ` (10 subsequent siblings)
  25 siblings, 4 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Drop it in the header since it's only used in hugetlb.c.

Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/hugetlb.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f7ca1a3870ea..c30f39815e13 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -143,9 +143,6 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma,
 			  unsigned long start, unsigned long end,
 			  struct page *ref_page);
-void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-				unsigned long start, unsigned long end,
-				struct page *ref_page);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(char *buf, int len, int nid);
 void hugetlb_show_meminfo(void);
@@ -385,13 +382,6 @@ static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 	BUG();
 }
 
-static inline void __unmap_hugepage_range(struct mmu_gather *tlb,
-			struct vm_area_struct *vma, unsigned long start,
-			unsigned long end, struct page *ref_page)
-{
-	BUG();
-}
-
 static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma, unsigned long address,
 			unsigned int flags)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (14 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:24 ` [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

They will be used in the follow up patches to either check/set/clear uffd-wp
bit of a huge pte.

So far it reuses all the small pte helpers.  Archs can overwrite these versions
when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the future.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/asm-generic/hugetlb.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 8e1e6244a89d..c45b9deb41ff 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -32,6 +32,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
 	return pte_modify(pte, newprot);
 }
 
+static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
+{
+	return pte_mkuffd_wp(pte);
+}
+
+static inline pte_t huge_pte_clear_uffd_wp(pte_t pte)
+{
+	return pte_clear_uffd_wp(pte);
+}
+
+static inline int huge_pte_uffd_wp(pte_t pte)
+{
+	return pte_uffd_wp(pte);
+}
+
 #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 		    pte_t *ptep, unsigned long sz)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (15 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
@ 2021-07-14 22:24 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:24 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.

We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
extra locks that we definitely don't need.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 924553aa8f78..8559b8bb7fa5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5059,6 +5059,25 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
 		goto out_ptl;
 
+	/* Handle userfault-wp first, before trying to lock more pages */
+	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
+	    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
+		struct vm_fault vmf = {
+			.vma = vma,
+			.address = haddr,
+			.flags = flags,
+		};
+
+		spin_unlock(ptl);
+		if (pagecache_page) {
+			unlock_page(pagecache_page);
+			put_page(pagecache_page);
+		}
+		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+		i_mmap_unlock_read(mapping);
+		return handle_userfault(&vmf, VM_UFFD_WP);
+	}
+
 	/*
 	 * hugetlb_cow() requires page locks of pte_page(entry) and
 	 * pagecache_page, so here we need take the former one
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (16 preceding siblings ...)
  2021-07-14 22:24 ` [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout
the stack.  Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with
UFFDIO_COPY.  Introduce huge_pte_mkuffd_wp() for it.

Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting
dirty bit in the huge pte if the page is installed as read-only.  However we'd
better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when
UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but
also because the page does contain dirty data that the kernel just copied from
the userspace.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/hugetlb.h |  6 ++++--
 mm/hugetlb.c            | 22 +++++++++++++++++-----
 mm/userfaultfd.c        | 12 ++++++++----
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c30f39815e13..fcdbf9f46d85 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -155,7 +155,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				unsigned long dst_addr,
 				unsigned long src_addr,
 				enum mcopy_atomic_mode mode,
-				struct page **pagep);
+				struct page **pagep,
+				bool wp_copy);
 #endif /* CONFIG_USERFAULTFD */
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
@@ -336,7 +337,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 						unsigned long dst_addr,
 						unsigned long src_addr,
 						enum mcopy_atomic_mode mode,
-						struct page **pagep)
+						struct page **pagep,
+						bool wp_copy)
 {
 	BUG();
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8559b8bb7fa5..f4efcb8c6214 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5141,7 +5141,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    unsigned long dst_addr,
 			    unsigned long src_addr,
 			    enum mcopy_atomic_mode mode,
-			    struct page **pagep)
+			    struct page **pagep,
+			    bool wp_copy)
 {
 	bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
 	struct hstate *h = hstate_vma(dst_vma);
@@ -5277,17 +5278,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
 	}
 
-	/* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */
-	if (is_continue && !vm_shared)
+	/*
+	 * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
+	 * with wp flag set, don't set pte write bit.
+	 */
+	if (wp_copy || (is_continue && !vm_shared))
 		writable = 0;
 	else
 		writable = dst_vma->vm_flags & VM_WRITE;
 
 	_dst_pte = make_huge_pte(dst_vma, page, writable);
-	if (writable)
-		_dst_pte = huge_pte_mkdirty(_dst_pte);
+	/*
+	 * Always mark UFFDIO_COPY page dirty; note that this may not be
+	 * extremely important for hugetlbfs for now since swapping is not
+	 * supported, but we should still be clear in that this page cannot be
+	 * thrown away at will, even if write bit not set.
+	 */
+	_dst_pte = huge_pte_mkdirty(_dst_pte);
 	_dst_pte = pte_mkyoung(_dst_pte);
 
+	if (wp_copy)
+		_dst_pte = huge_pte_mkuffd_wp(_dst_pte);
+
 	set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
 
 	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 0c7212dfb95d..501d6b9f7a5a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -297,7 +297,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 					      unsigned long dst_start,
 					      unsigned long src_start,
 					      unsigned long len,
-					      enum mcopy_atomic_mode mode)
+					      enum mcopy_atomic_mode mode,
+					      bool wp_copy)
 {
 	int vm_shared = dst_vma->vm_flags & VM_SHARED;
 	ssize_t err;
@@ -393,7 +394,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		}
 
 		err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
-					       dst_addr, src_addr, mode, &page);
+					       dst_addr, src_addr, mode, &page,
+					       wp_copy);
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		i_mmap_unlock_read(mapping);
@@ -448,7 +450,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 				      unsigned long dst_start,
 				      unsigned long src_start,
 				      unsigned long len,
-				      enum mcopy_atomic_mode mode);
+				      enum mcopy_atomic_mode mode,
+				      bool wp_copy);
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
@@ -568,7 +571,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	 */
 	if (is_vm_hugetlb_page(dst_vma))
 		return  __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
-						src_start, len, mcopy_mode);
+					       src_start, len, mcopy_mode,
+					       wp_copy);
 
 	if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
 		goto out_unlock;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (17 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

This starts from passing cp_flags into hugetlb_change_protection() so hugetlb
will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests.

huge_pte_clear_uffd_wp() is introduced to handle the case where the
UFFDIO_WRITEPROTECT is requested upon migrating huge page entries.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/hugetlb.h |  6 ++++--
 mm/hugetlb.c            | 13 ++++++++++++-
 mm/mprotect.c           |  3 ++-
 mm/userfaultfd.c        |  8 ++++++++
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fcdbf9f46d85..e19ca363803d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -205,7 +205,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
 int pmd_huge(pmd_t pmd);
 int pud_huge(pud_t pud);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
-		unsigned long address, unsigned long end, pgprot_t newprot);
+		unsigned long address, unsigned long end, pgprot_t newprot,
+		unsigned long cp_flags);
 
 bool is_hugetlb_entry_migration(pte_t pte);
 void hugetlb_unshare_all_pmds(struct vm_area_struct *vma);
@@ -372,7 +373,8 @@ static inline void move_hugetlb_state(struct page *oldpage,
 
 static inline unsigned long hugetlb_change_protection(
 			struct vm_area_struct *vma, unsigned long address,
-			unsigned long end, pgprot_t newprot)
+			unsigned long end, pgprot_t newprot,
+			unsigned long cp_flags)
 {
 	return 0;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f4efcb8c6214..7d3558265b6f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5519,7 +5519,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
-		unsigned long address, unsigned long end, pgprot_t newprot)
+		unsigned long address, unsigned long end,
+		pgprot_t newprot, unsigned long cp_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long start = address;
@@ -5529,6 +5530,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	unsigned long pages = 0;
 	bool shared_pmd = false;
 	struct mmu_notifier_range range;
+	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
+	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
 
 	/*
 	 * In the case of shared PMDs, the area to flush could be beyond
@@ -5570,6 +5573,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 				entry = make_readable_migration_entry(
 							swp_offset(entry));
 				newpte = swp_entry_to_pte(entry);
+				if (uffd_wp)
+					newpte = pte_swp_mkuffd_wp(newpte);
+				else if (uffd_wp_resolve)
+					newpte = pte_swp_clear_uffd_wp(newpte);
 				set_huge_swap_pte_at(mm, address, ptep,
 						     newpte, huge_page_size(h));
 				pages++;
@@ -5584,6 +5591,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 			old_pte = huge_ptep_modify_prot_start(vma, address, ptep);
 			pte = pte_mkhuge(huge_pte_modify(old_pte, newprot));
 			pte = arch_make_huge_pte(pte, shift, vma->vm_flags);
+			if (uffd_wp)
+				pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte));
+			else if (uffd_wp_resolve)
+				pte = huge_pte_clear_uffd_wp(pte);
 			huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
 			pages++;
 		}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3fcb87b59696..96f4df023439 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -426,7 +426,8 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start,
 	BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) == MM_CP_UFFD_WP_ALL);
 
 	if (is_vm_hugetlb_page(vma))
-		pages = hugetlb_change_protection(vma, start, end, newprot);
+		pages = hugetlb_change_protection(vma, start, end, newprot,
+						  cp_flags);
 	else
 		pages = change_protection_range(vma, start, end, newprot,
 						cp_flags);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 501d6b9f7a5a..7ba721aca1c5 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -695,6 +695,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 			unsigned long len, bool enable_wp, bool *mmap_changing)
 {
 	struct vm_area_struct *dst_vma;
+	unsigned long page_mask;
 	pgprot_t newprot;
 	int err;
 
@@ -731,6 +732,13 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 	if (!vma_is_anonymous(dst_vma))
 		goto out_unlock;
 
+	if (is_vm_hugetlb_page(dst_vma)) {
+		err = -EINVAL;
+		page_mask = vma_kernel_pagesize(dst_vma) - 1;
+		if ((start & page_mask) || (len & page_mask))
+			goto out_unlock;
+	}
+
 	if (enable_wp)
 		newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
 	else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (18 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

This is to let hugetlbfs be prepared to also recognize swap special ptes just
like uffd-wp special swap ptes.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7d3558265b6f..6a66b598245e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -95,6 +95,26 @@ static inline bool subpool_is_free(struct hugepage_subpool *spool)
 	return true;
 }
 
+/*
+ * These are sister versions of is_swap_pte() and pte_has_swap_entry().  We
+ * need standalone ones because huge_pte_none() is handled differently from
+ * pte_none().  For more information, please refer to comments above
+ * is_swap_pte() and pte_has_swap_entry().
+ *
+ * Here we directly reuse the pte level of swap special ptes, for example, the
+ * pte_swp_uffd_wp_special().  It just stands for a huge page rather than a
+ * small page for hugetlbfs pages.
+ */
+static inline bool is_huge_swap_pte(pte_t pte)
+{
+	return !huge_pte_none(pte) && !pte_present(pte);
+}
+
+static inline bool huge_pte_has_swap_entry(pte_t pte)
+{
+	return is_huge_swap_pte(pte) && !is_swap_special_pte(pte);
+}
+
 static inline void unlock_or_release_subpool(struct hugepage_subpool *spool,
 						unsigned long irq_flags)
 {
@@ -4138,7 +4158,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
 {
 	swp_entry_t swp;
 
-	if (huge_pte_none(pte) || pte_present(pte))
+	if (!huge_pte_has_swap_entry(pte))
 		return false;
 	swp = pte_to_swp_entry(pte);
 	if (is_migration_entry(swp))
@@ -4151,7 +4171,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
 {
 	swp_entry_t swp;
 
-	if (huge_pte_none(pte) || pte_present(pte))
+	if (!huge_pte_has_swap_entry(pte))
 		return false;
 	swp = pte_to_swp_entry(pte);
 	if (is_hwpoison_entry(swp))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (19 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Teach the hugetlb page fault code to understand uffd-wp special pte.  For
example, when seeing such a pte we need to convert any write fault into a read
one (which is fake - we'll retry the write later if so).  Meanwhile, for
handle_userfault() we'll need to make sure we must wait for the special swap
pte too just like a none pte.

Note that we also need to teach UFFDIO_COPY about this special pte across the
code path so that we can safely install a new page at this special pte as long
as we know it's a stall entry.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c |  5 ++++-
 mm/hugetlb.c     | 26 ++++++++++++++++++++------
 mm/userfaultfd.c |  5 ++++-
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e1c1cbc7bcc8..644df737fbb2 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -245,8 +245,11 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 	/*
 	 * Lockless access: we're in a wait_event so it's ok if it
 	 * changes under us.
+	 *
+	 * Regarding uffd-wp special case, please refer to comments in
+	 * userfaultfd_must_wait().
 	 */
-	if (huge_pte_none(pte))
+	if (huge_pte_none(pte) || pte_swp_uffd_wp_special(pte))
 		ret = true;
 	if (!huge_pte_write(pte) && (reason & VM_UFFD_WP))
 		ret = true;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6a66b598245e..15e5de480cf0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4786,7 +4786,8 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
 static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 			struct vm_area_struct *vma,
 			struct address_space *mapping, pgoff_t idx,
-			unsigned long address, pte_t *ptep, unsigned int flags)
+			unsigned long address, pte_t *ptep,
+			pte_t old_pte, unsigned int flags)
 {
 	struct hstate *h = hstate_vma(vma);
 	vm_fault_t ret = VM_FAULT_SIGBUS;
@@ -4910,7 +4911,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 
 	ptl = huge_pte_lock(h, mm, ptep);
 	ret = 0;
-	if (!huge_pte_none(huge_ptep_get(ptep)))
+	if (!pte_same(huge_ptep_get(ptep), old_pte))
 		goto backout;
 
 	if (anon_rmap) {
@@ -4920,6 +4921,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 		page_dup_rmap(page, true);
 	new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
 				&& (vma->vm_flags & VM_SHARED)));
+	/*
+	 * If this pte was previously wr-protected, keep it wr-protected even
+	 * if populated.
+	 */
+	if (unlikely(pte_swp_uffd_wp_special(old_pte)))
+		new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte));
 	set_huge_pte_at(mm, haddr, ptep, new_pte);
 
 	hugetlb_count_add(pages_per_huge_page(h), mm);
@@ -5035,8 +5042,13 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 	entry = huge_ptep_get(ptep);
-	if (huge_pte_none(entry)) {
-		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags);
+	/*
+	 * uffd-wp-special should be handled merely the same as pte none
+	 * because it's basically a none pte with a special marker
+	 */
+	if (huge_pte_none(entry) || pte_swp_uffd_wp_special(entry)) {
+		ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
+				      entry, flags);
 		goto out_mutex;
 	}
 
@@ -5170,7 +5182,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
 	unsigned long size;
 	int vm_shared = dst_vma->vm_flags & VM_SHARED;
-	pte_t _dst_pte;
+	pte_t _dst_pte, cur_pte;
 	spinlock_t *ptl;
 	int ret = -ENOMEM;
 	struct page *page;
@@ -5287,8 +5299,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (idx >= size)
 		goto out_release_unlock;
 
+	cur_pte = huge_ptep_get(dst_pte);
 	ret = -EEXIST;
-	if (!huge_pte_none(huge_ptep_get(dst_pte)))
+	/* Please refer to shmem_mfill_atomic_pte() for uffd-wp special case */
+	if (!huge_pte_none(cur_pte) && !pte_swp_uffd_wp_special(cur_pte))
 		goto out_release_unlock;
 
 	if (vm_shared) {
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 7ba721aca1c5..a8038903effd 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -363,6 +363,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	}
 
 	while (src_addr < src_start + len) {
+		pte_t pteval;
+
 		BUG_ON(dst_addr >= dst_start + len);
 
 		/*
@@ -385,8 +387,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 			goto out_unlock;
 		}
 
+		pteval = huge_ptep_get(dst_pte);
 		if (mode != MCOPY_ATOMIC_CONTINUE &&
-		    !huge_pte_none(huge_ptep_get(dst_pte))) {
+		    !huge_pte_none(pteval) && !pte_swp_uffd_wp_special(pteval)) {
 			err = -EEXIST;
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			i_mmap_unlock_read(mapping);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (20 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

Teach hugetlbfs code to wr-protect none ptes just in case the page cache
existed for that pte.  Meanwhile we also need to be able to recognize a uffd-wp
marker pte and remove it for uffd_wp_resolve.

Since at it, introduce a variable "psize" to replace all references to the huge
page size fetcher.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 15e5de480cf0..6ae911185554 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5561,7 +5561,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	pte_t *ptep;
 	pte_t pte;
 	struct hstate *h = hstate_vma(vma);
-	unsigned long pages = 0;
+	unsigned long pages = 0, psize = huge_page_size(h);
 	bool shared_pmd = false;
 	struct mmu_notifier_range range;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
@@ -5581,13 +5581,19 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 
 	mmu_notifier_invalidate_range_start(&range);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
-	for (; address < end; address += huge_page_size(h)) {
+	for (; address < end; address += psize) {
 		spinlock_t *ptl;
-		ptep = huge_pte_offset(mm, address, huge_page_size(h));
+		ptep = huge_pte_offset(mm, address, psize);
 		if (!ptep)
 			continue;
 		ptl = huge_pte_lock(h, mm, ptep);
 		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
+			/*
+			 * When uffd-wp is enabled on the vma, unshare
+			 * shouldn't happen at all.  Warn about it if it
+			 * happened due to some reason.
+			 */
+			WARN_ON_ONCE(uffd_wp || uffd_wp_resolve);
 			pages++;
 			spin_unlock(ptl);
 			shared_pmd = true;
@@ -5612,12 +5618,21 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 				else if (uffd_wp_resolve)
 					newpte = pte_swp_clear_uffd_wp(newpte);
 				set_huge_swap_pte_at(mm, address, ptep,
-						     newpte, huge_page_size(h));
+						     newpte, psize);
 				pages++;
 			}
 			spin_unlock(ptl);
 			continue;
 		}
+		if (unlikely(is_swap_special_pte(pte))) {
+			WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+			/*
+			 * This is changing a non-present pte into a none pte,
+			 * no need for huge_ptep_modify_prot_start/commit().
+			 */
+			if (uffd_wp_resolve)
+				huge_pte_clear(mm, address, ptep, psize);
+		}
 		if (!huge_pte_none(pte)) {
 			pte_t old_pte;
 			unsigned int shift = huge_page_shift(hstate_vma(vma));
@@ -5631,6 +5646,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 				pte = huge_pte_clear_uffd_wp(pte);
 			huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte);
 			pages++;
+		} else {
+			/* None pte */
+			if (unlikely(uffd_wp))
+				/* Safe to modify directly (none->non-present). */
+				set_huge_pte_at(mm, address, ptep,
+						pte_swp_mkuffd_wp_special(vma));
 		}
 		spin_unlock(ptl);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (21 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if
unmapping an entire vma or synchronized such that faults can not race with the
unmap operation.  This requires passing zap_flags all the way to the lowest
level hugetlb unmap routine: __unmap_hugepage_range.

In general, unmap calls originated in hugetlbfs code will pass the
ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent
faults.  The exception is hole punch which will first unmap without any
synchronization.  Later when hole punch actually removes the page from the
file, it will check to see if there was a subsequent fault and if so take the
hugetlb fault mutex while unmapping again.  This second unmap will pass in
ZAP_FLAG_DROP_FILE_UFFD_WP.

The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP flag
when unmap a hugetlb range" is (IMHO): we should never reach a state when a
page fault could errornously fault in a page-cache page that was wr-protected
to be writable, even in an extremely short period.  That could happen if
e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when calling
hugetlb_vmdelete_list(), because if a page fault triggers after that call and
before the remove_inode_hugepages() right after it, the page cache can be
mapped writable again in the small window, which can cause data corruption.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/hugetlbfs/inode.c    | 15 +++++++++------
 include/linux/hugetlb.h |  8 +++++---
 mm/hugetlb.c            | 27 +++++++++++++++++++++------
 mm/memory.c             |  5 ++++-
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 926eeb9bf4eb..fdbb972b781b 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page)
 }
 
 static void
-hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
+hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
+		      unsigned long zap_flags)
 {
 	struct vm_area_struct *vma;
 
@@ -437,7 +438,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
 		}
 
 		unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end,
-									NULL);
+				     NULL, zap_flags);
 	}
 }
 
@@ -515,7 +516,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 				mutex_lock(&hugetlb_fault_mutex_table[hash]);
 				hugetlb_vmdelete_list(&mapping->i_mmap,
 					index * pages_per_huge_page(h),
-					(index + 1) * pages_per_huge_page(h));
+					(index + 1) * pages_per_huge_page(h),
+					ZAP_FLAG_DROP_FILE_UFFD_WP);
 				i_mmap_unlock_write(mapping);
 			}
 
@@ -581,7 +583,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 	i_mmap_lock_write(mapping);
 	i_size_write(inode, offset);
 	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
-		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
+		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0,
+				      ZAP_FLAG_DROP_FILE_UFFD_WP);
 	i_mmap_unlock_write(mapping);
 	remove_inode_hugepages(inode, offset, LLONG_MAX);
 }
@@ -614,8 +617,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		i_mmap_lock_write(mapping);
 		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
 			hugetlb_vmdelete_list(&mapping->i_mmap,
-						hole_start >> PAGE_SHIFT,
-						hole_end  >> PAGE_SHIFT);
+					      hole_start >> PAGE_SHIFT,
+					      hole_end >> PAGE_SHIFT, 0);
 		i_mmap_unlock_write(mapping);
 		remove_inode_hugepages(inode, hole_start, hole_end);
 		inode_unlock(inode);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index e19ca363803d..809bb63ecf9e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -138,11 +138,12 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			 unsigned long *, unsigned long *, long, unsigned int,
 			 int *);
 void unmap_hugepage_range(struct vm_area_struct *,
-			  unsigned long, unsigned long, struct page *);
+			  unsigned long, unsigned long, struct page *,
+			  unsigned long);
 void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma,
 			  unsigned long start, unsigned long end,
-			  struct page *ref_page);
+			  struct page *ref_page, unsigned long zap_flags);
 void hugetlb_report_meminfo(struct seq_file *);
 int hugetlb_report_node_meminfo(char *buf, int len, int nid);
 void hugetlb_show_meminfo(void);
@@ -381,7 +382,8 @@ static inline unsigned long hugetlb_change_protection(
 
 static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			struct vm_area_struct *vma, unsigned long start,
-			unsigned long end, struct page *ref_page)
+			unsigned long end, struct page *ref_page,
+			unsigned long zap_flags)
 {
 	BUG();
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6ae911185554..cc5616d78f35 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4353,7 +4353,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			    unsigned long start, unsigned long end,
-			    struct page *ref_page)
+			    struct page *ref_page, unsigned long zap_flags)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -4405,6 +4405,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			continue;
 		}
 
+		if (unlikely(is_swap_special_pte(pte))) {
+			WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte));
+			/*
+			 * Only drop the special swap uffd-wp pte if
+			 * e.g. unmapping a vma or punching a hole (with proper
+			 * lock held so that concurrent page fault won't happen).
+			 */
+			if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP)
+				huge_pte_clear(mm, address, ptep, sz);
+			spin_unlock(ptl);
+			continue;
+		}
+
 		/*
 		 * Migrating hugepage or HWPoisoned hugepage is already
 		 * unmapped and its refcount is dropped, so just clear pte here.
@@ -4456,9 +4469,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 			  struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end, struct page *ref_page)
+			  unsigned long end, struct page *ref_page,
+			  unsigned long zap_flags)
 {
-	__unmap_hugepage_range(tlb, vma, start, end, ref_page);
+	__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
 
 	/*
 	 * Clear this flag so that x86's huge_pmd_share page_table_shareable
@@ -4474,12 +4488,13 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb,
 }
 
 void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
-			  unsigned long end, struct page *ref_page)
+			  unsigned long end, struct page *ref_page,
+			  unsigned long zap_flags)
 {
 	struct mmu_gather tlb;
 
 	tlb_gather_mmu(&tlb, vma->vm_mm);
-	__unmap_hugepage_range(&tlb, vma, start, end, ref_page);
+	__unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags);
 	tlb_finish_mmu(&tlb);
 }
 
@@ -4534,7 +4549,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER))
 			unmap_hugepage_range(iter_vma, address,
-					     address + huge_page_size(h), page);
+					     address + huge_page_size(h), page, 0);
 	}
 	i_mmap_unlock_write(mapping);
 }
diff --git a/mm/memory.c b/mm/memory.c
index af91bee934c7..c4a80f45e48f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,8 +1626,11 @@ static void unmap_single_vma(struct mmu_gather *tlb,
 			 * safe to do nothing in this case.
 			 */
 			if (vma->vm_file) {
+				unsigned long zap_flags = details ?
+				    details->zap_flags : 0;
 				i_mmap_lock_write(vma->vm_file->f_mapping);
-				__unmap_hugepage_range_final(tlb, vma, start, end, NULL);
+				__unmap_hugepage_range_final(tlb, vma, start, end,
+							     NULL, zap_flags);
 				i_mmap_unlock_write(vma->vm_file->f_mapping);
 			}
 		} else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (22 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
  2021-07-14 22:25 ` [PATCH v4 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

This requires the pagemap code to be able to recognize the newly introduced
swap special pte for uffd-wp, meanwhile the general case for hugetlb that we
recently start to support.  It should make pagemap uffd-wp support complete.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/proc/task_mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9c5af77b5290..988e29fa1f00 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1389,6 +1389,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		flags |= PM_SWAP;
 		if (is_pfn_swap_entry(entry))
 			page = pfn_swap_entry_to_page(entry);
+	} else if (pte_swp_uffd_wp_special(pte)) {
+		flags |= PM_UFFD_WP;
 	}
 
 	if (page && !PageAnon(page))
@@ -1522,10 +1524,15 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (page_mapcount(page) == 1)
 			flags |= PM_MMAP_EXCLUSIVE;
 
+		if (huge_pte_uffd_wp(pte))
+			flags |= PM_UFFD_WP;
+
 		flags |= PM_PRESENT;
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
 				((addr & ~hmask) >> PAGE_SHIFT);
+	} else if (pte_swp_uffd_wp_special(pte)) {
+		flags |= PM_UFFD_WP;
 	}
 
 	for (; addr != end; addr += PAGE_SIZE) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (23 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  2021-07-14 22:25 ` [PATCH v4 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

We've had all the necessary changes ready for both shmem and hugetlbfs.  Turn
on all the shmem/hugetlbfs switches for userfaultfd-wp.

We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because
all existing types now support write protection mode.

Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c                 | 21 ++-------------------
 include/linux/userfaultfd_k.h    | 12 ++++++++++++
 include/uapi/linux/userfaultfd.h | 10 ++++++++--
 mm/userfaultfd.c                 |  9 +++------
 4 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 644df737fbb2..980879117110 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1275,24 +1275,6 @@ static __always_inline int validate_range(struct mm_struct *mm,
 	return 0;
 }
 
-static inline bool vma_can_userfault(struct vm_area_struct *vma,
-				     unsigned long vm_flags)
-{
-	/* FIXME: add WP support to hugetlbfs and shmem */
-	if (vm_flags & VM_UFFD_WP) {
-		if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma))
-			return false;
-	}
-
-	if (vm_flags & VM_UFFD_MINOR) {
-		if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
-			return false;
-	}
-
-	return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
-	       vma_is_shmem(vma);
-}
-
 static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 				unsigned long arg)
 {
@@ -1966,7 +1948,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 		~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
 #endif
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
-	uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
+	uffdio_api.features &=
+	    ~(UFFD_FEATURE_PAGEFAULT_FLAG_WP | UFFD_FEATURE_WP_HUGETLBFS_SHMEM);
 #endif
 	uffdio_api.ioctls = UFFD_API_IOCTLS;
 	ret = -EFAULT;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 47c74ef635ab..c4228acd1d80 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -16,6 +16,7 @@
 #include <linux/fcntl.h>
 #include <linux/mm.h>
 #include <asm-generic/pgtable_uffd.h>
+#include <linux/hugetlb_inline.h>
 
 /* The set of all possible UFFD-related VM flags. */
 #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
@@ -138,6 +139,17 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 	return vma->vm_flags & __VM_UFFD_FLAGS;
 }
 
+static inline bool vma_can_userfault(struct vm_area_struct *vma,
+				     unsigned long vm_flags)
+{
+	if (vm_flags & VM_UFFD_MINOR)
+		return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
+
+	return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
+	       vma_is_shmem(vma);
+}
+
+
 extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
 extern void dup_userfaultfd_complete(struct list_head *);
 
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 05b31d60acf6..a67b5185a7a9 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -32,7 +32,8 @@
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
 			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
-			   UFFD_FEATURE_MINOR_SHMEM)
+			   UFFD_FEATURE_MINOR_SHMEM |		\
+			   UFFD_FEATURE_WP_HUGETLBFS_SHMEM)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -46,7 +47,8 @@
 #define UFFD_API_RANGE_IOCTLS_BASIC		\
 	((__u64)1 << _UFFDIO_WAKE |		\
 	 (__u64)1 << _UFFDIO_COPY |		\
-	 (__u64)1 << _UFFDIO_CONTINUE)
+	 (__u64)1 << _UFFDIO_CONTINUE |		\
+	 (__u64)1 << _UFFDIO_WRITEPROTECT)
 
 /*
  * Valid ioctl command number range with this API is from 0x00 to
@@ -189,6 +191,9 @@ struct uffdio_api {
 	 *
 	 * UFFD_FEATURE_MINOR_SHMEM indicates the same support as
 	 * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
+	 *
+	 * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd
+	 * write-protection mode is supported on both shmem and hugetlbfs.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -201,6 +206,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
 #define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
+#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM		(1<<11)
 	__u64 features;
 
 	__u64 ioctls;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index a8038903effd..6587ed787707 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -724,15 +724,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
 
 	err = -ENOENT;
 	dst_vma = find_dst_vma(dst_mm, start, len);
-	/*
-	 * Make sure the vma is not shared, that the dst range is
-	 * both valid and fully within a single existing vma.
-	 */
-	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
+
+	if (!dst_vma)
 		goto out_unlock;
 	if (!userfaultfd_wp(dst_vma))
 		goto out_unlock;
-	if (!vma_is_anonymous(dst_vma))
+	if (!vma_can_userfault(dst_vma, dst_vma->vm_flags))
 		goto out_unlock;
 
 	if (is_vm_hugetlb_page(dst_vma)) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v4 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs
  2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
                   ` (24 preceding siblings ...)
  2021-07-14 22:25 ` [PATCH v4 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
@ 2021-07-14 22:25 ` Peter Xu
  25 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-14 22:25 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Mike Kravetz, Axel Rasmussen, Miaohe Lin,
	Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe,
	Alistair Popple, Matthew Wilcox, peterx, Jerome Glisse,
	Andrea Arcangeli, Mike Rapoport, Nadav Amit, David Hildenbrand

After we added support for shmem and hugetlbfs, we can turn uffd-wp test on
always now.

Define HUGETLB_EXPECTED_IOCTLS to avoid using UFFD_API_RANGE_IOCTLS_BASIC,
because UFFD_API_RANGE_IOCTLS_BASIC is normally a superset of capabilities,
while the test may not satisfy them all.  E.g., when hugetlb registered without
minor mode, then we need to explicitly remove _UFFDIO_CONTINUE.  Same thing to
uffd-wp, as we'll need to explicitly remove _UFFDIO_WRITEPROTECT if not
registered with uffd-wp.

For the long term, we may consider dropping UFFD_API_* macros completely from
uapi/linux/userfaultfd.h header files, because it may cause kernel header
update to easily break userspace.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index e363bdaff59d..015f2df8ece4 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -80,7 +80,7 @@ static int test_type;
 static volatile bool test_uffdio_copy_eexist = true;
 static volatile bool test_uffdio_zeropage_eexist = true;
 /* Whether to test uffd write-protection */
-static bool test_uffdio_wp = false;
+static bool test_uffdio_wp = true;
 /* Whether to test uffd minor faults */
 static bool test_uffdio_minor = false;
 
@@ -320,6 +320,9 @@ struct uffd_test_ops {
 					 (1 << _UFFDIO_ZEROPAGE) | \
 					 (1 << _UFFDIO_WRITEPROTECT))
 
+#define HUGETLB_EXPECTED_IOCTLS		((1 << _UFFDIO_WAKE) |	\
+					 (1 << _UFFDIO_COPY))
+
 static struct uffd_test_ops anon_uffd_test_ops = {
 	.expected_ioctls = ANON_EXPECTED_IOCTLS,
 	.allocate_area	= anon_allocate_area,
@@ -335,7 +338,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-	.expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE),
+	.expected_ioctls = HUGETLB_EXPECTED_IOCTLS,
 	.allocate_area	= hugetlb_allocate_area,
 	.release_pages	= hugetlb_release_pages,
 	.alias_mapping = hugetlb_alias_mapping,
@@ -1580,8 +1583,6 @@ static void set_test_type(const char *type)
 	if (!strcmp(type, "anon")) {
 		test_type = TEST_ANON;
 		uffd_test_ops = &anon_uffd_test_ops;
-		/* Only enable write-protect test for anonymous test */
-		test_uffdio_wp = true;
 	} else if (!strcmp(type, "hugetlb")) {
 		test_type = TEST_HUGETLB;
 		uffd_test_ops = &hugetlb_uffd_test_ops;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
@ 2021-07-15  5:49   ` kernel test robot
  2021-07-15  8:10   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2021-07-15  5:49 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: kbuild-all, Andrew Morton, Linux Memory Management List,
	Mike Kravetz, Axel Rasmussen, Miaohe Lin, Kirill A . Shutemov,
	Hugh Dickins, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 10008 bytes --]

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210714]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: powerpc64-randconfig-r032-20210714 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
        git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/hugetlb.c:4334:6: warning: no previous prototype for '__unmap_hugepage_range' [-Wmissing-prototypes]
    4334 | void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
         |      ^~~~~~~~~~~~~~~~~~~~~~


vim +/__unmap_hugepage_range +4334 mm/hugetlb.c

63551ae0feaaa2 David Gibson       2005-06-21  4333  
24669e58477e27 Aneesh Kumar K.V   2012-07-31 @4334  void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4335  			    unsigned long start, unsigned long end,
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4336  			    struct page *ref_page)
63551ae0feaaa2 David Gibson       2005-06-21  4337  {
63551ae0feaaa2 David Gibson       2005-06-21  4338  	struct mm_struct *mm = vma->vm_mm;
63551ae0feaaa2 David Gibson       2005-06-21  4339  	unsigned long address;
c7546f8f03f5a4 David Gibson       2005-08-05  4340  	pte_t *ptep;
63551ae0feaaa2 David Gibson       2005-06-21  4341  	pte_t pte;
cb900f41215447 Kirill A. Shutemov 2013-11-14  4342  	spinlock_t *ptl;
63551ae0feaaa2 David Gibson       2005-06-21  4343  	struct page *page;
a5516438959d90 Andi Kleen         2008-07-23  4344  	struct hstate *h = hstate_vma(vma);
a5516438959d90 Andi Kleen         2008-07-23  4345  	unsigned long sz = huge_page_size(h);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4346  	struct mmu_notifier_range range;
a5516438959d90 Andi Kleen         2008-07-23  4347  
63551ae0feaaa2 David Gibson       2005-06-21  4348  	WARN_ON(!is_vm_hugetlb_page(vma));
a5516438959d90 Andi Kleen         2008-07-23  4349  	BUG_ON(start & ~huge_page_mask(h));
a5516438959d90 Andi Kleen         2008-07-23  4350  	BUG_ON(end & ~huge_page_mask(h));
63551ae0feaaa2 David Gibson       2005-06-21  4351  
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4352  	/*
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4353  	 * This is a hugetlb vma, all the pte entries should point
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4354  	 * to huge page.
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4355  	 */
ed6a79352cad00 Peter Zijlstra     2018-08-31  4356  	tlb_change_page_size(tlb, sz);
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4357  	tlb_start_vma(tlb, vma);
dff11abe280b47 Mike Kravetz       2018-10-05  4358  
dff11abe280b47 Mike Kravetz       2018-10-05  4359  	/*
dff11abe280b47 Mike Kravetz       2018-10-05  4360  	 * If sharing possible, alert mmu notifiers of worst case.
dff11abe280b47 Mike Kravetz       2018-10-05  4361  	 */
6f4f13e8d9e27c Jérôme Glisse      2019-05-13  4362  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
6f4f13e8d9e27c Jérôme Glisse      2019-05-13  4363  				end);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4364  	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4365  	mmu_notifier_invalidate_range_start(&range);
569f48b85813f0 Hillf Danton       2014-12-10  4366  	address = start;
569f48b85813f0 Hillf Danton       2014-12-10  4367  	for (; address < end; address += sz) {
7868a2087ec13e Punit Agrawal      2017-07-06  4368  		ptep = huge_pte_offset(mm, address, sz);
c7546f8f03f5a4 David Gibson       2005-08-05  4369  		if (!ptep)
c7546f8f03f5a4 David Gibson       2005-08-05  4370  			continue;
c7546f8f03f5a4 David Gibson       2005-08-05  4371  
cb900f41215447 Kirill A. Shutemov 2013-11-14  4372  		ptl = huge_pte_lock(h, mm, ptep);
34ae204f18519f Mike Kravetz       2020-08-11  4373  		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4374  			spin_unlock(ptl);
dff11abe280b47 Mike Kravetz       2018-10-05  4375  			/*
dff11abe280b47 Mike Kravetz       2018-10-05  4376  			 * We just unmapped a page of PMDs by clearing a PUD.
dff11abe280b47 Mike Kravetz       2018-10-05  4377  			 * The caller's TLB flush range should cover this area.
dff11abe280b47 Mike Kravetz       2018-10-05  4378  			 */
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4379  			continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4380  		}
39dde65c9940c9 Kenneth W Chen     2006-12-06  4381  
6629326b89b6e6 Hillf Danton       2012-03-23  4382  		pte = huge_ptep_get(ptep);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4383  		if (huge_pte_none(pte)) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4384  			spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4385  			continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4386  		}
6629326b89b6e6 Hillf Danton       2012-03-23  4387  
6629326b89b6e6 Hillf Danton       2012-03-23  4388  		/*
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4389  		 * Migrating hugepage or HWPoisoned hugepage is already
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4390  		 * unmapped and its refcount is dropped, so just clear pte here.
6629326b89b6e6 Hillf Danton       2012-03-23  4391  		 */
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4392  		if (unlikely(!pte_present(pte))) {
9386fac34c7cbe Punit Agrawal      2017-07-06  4393  			huge_pte_clear(mm, address, ptep, sz);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4394  			spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4395  			continue;
8c4894c6bc790d Naoya Horiguchi    2012-12-12  4396  		}
6629326b89b6e6 Hillf Danton       2012-03-23  4397  
6629326b89b6e6 Hillf Danton       2012-03-23  4398  		page = pte_page(pte);
04f2cbe35699d2 Mel Gorman         2008-07-23  4399  		/*
04f2cbe35699d2 Mel Gorman         2008-07-23  4400  		 * If a reference page is supplied, it is because a specific
04f2cbe35699d2 Mel Gorman         2008-07-23  4401  		 * page is being unmapped, not a range. Ensure the page we
04f2cbe35699d2 Mel Gorman         2008-07-23  4402  		 * are about to unmap is the actual page of interest.
04f2cbe35699d2 Mel Gorman         2008-07-23  4403  		 */
04f2cbe35699d2 Mel Gorman         2008-07-23  4404  		if (ref_page) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4405  			if (page != ref_page) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4406  				spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4407  				continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4408  			}
04f2cbe35699d2 Mel Gorman         2008-07-23  4409  			/*
04f2cbe35699d2 Mel Gorman         2008-07-23  4410  			 * Mark the VMA as having unmapped its page so that
04f2cbe35699d2 Mel Gorman         2008-07-23  4411  			 * future faults in this VMA will fail rather than
04f2cbe35699d2 Mel Gorman         2008-07-23  4412  			 * looking like data was lost
04f2cbe35699d2 Mel Gorman         2008-07-23  4413  			 */
04f2cbe35699d2 Mel Gorman         2008-07-23  4414  			set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
04f2cbe35699d2 Mel Gorman         2008-07-23  4415  		}
04f2cbe35699d2 Mel Gorman         2008-07-23  4416  
c7546f8f03f5a4 David Gibson       2005-08-05  4417  		pte = huge_ptep_get_and_clear(mm, address, ptep);
b528e4b6405b9f Aneesh Kumar K.V   2016-12-12  4418  		tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
106c992a5ebef2 Gerald Schaefer    2013-04-29  4419  		if (huge_pte_dirty(pte))
6649a3863232eb Ken Chen           2007-02-08  4420  			set_page_dirty(page);
9e81130b7ce230 Hillf Danton       2012-03-21  4421  
5d317b2b653659 Naoya Horiguchi    2015-11-05  4422  		hugetlb_count_sub(pages_per_huge_page(h), mm);
d281ee61451835 Kirill A. Shutemov 2016-01-15  4423  		page_remove_rmap(page, true);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4424  
cb900f41215447 Kirill A. Shutemov 2013-11-14  4425  		spin_unlock(ptl);
e77b0852b551ff Aneesh Kumar K.V   2016-07-26  4426  		tlb_remove_page_size(tlb, page, huge_page_size(h));
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4427  		/*
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4428  		 * Bail out after unmapping reference page if supplied
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4429  		 */
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4430  		if (ref_page)
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4431  			break;
fe1668ae5bf014 Kenneth W Chen     2006-10-04  4432  	}
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4433  	mmu_notifier_invalidate_range_end(&range);
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4434  	tlb_end_vma(tlb, vma);
^1da177e4c3f41 Linus Torvalds     2005-04-16  4435  }
63551ae0feaaa2 David Gibson       2005-06-21  4436  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22742 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  2021-07-14 22:20 ` [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
@ 2021-07-15  6:20   ` kernel test robot
  2021-07-15 18:50     ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: kernel test robot @ 2021-07-15  6:20 UTC (permalink / raw)
  To: Peter Xu, linux-kernel, linux-mm
  Cc: clang-built-linux, kbuild-all, Jason Gunthorpe, peterx,
	Matthew Wilcox, Andrew Morton, Linux Memory Management List,
	Axel Rasmussen, Nadav Amit, Jerome Glisse

[-- Attachment #1: Type: text/plain, Size: 2554 bytes --]

Hi Peter,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.14-rc1 next-20210714]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: x86_64-randconfig-r021-20210714 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 0e49c54a8cbd3e779e5526a5888c683c01cc3c50)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/8b3e70f40b8333a803e642ed5c8cce738985881c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
        git checkout 8b3e70f40b8333a803e642ed5c8cce738985881c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/memory.c:4167:6: error: implicit declaration of function 'uffd_disable_fault_around' [-Werror,-Wimplicit-function-declaration]
           if (uffd_disable_fault_around(vmf->vma))
               ^
   1 error generated.


vim +/uffd_disable_fault_around +4167 mm/memory.c

  4159	
  4160	/* Return true if we should do read fault-around, false otherwise */
  4161	static inline bool should_fault_around(struct vm_fault *vmf)
  4162	{
  4163		/* No ->map_pages?  No way to fault around... */
  4164		if (!vmf->vma->vm_ops->map_pages)
  4165			return false;
  4166	
> 4167		if (uffd_disable_fault_around(vmf->vma))
  4168			return false;
  4169	
  4170		return fault_around_bytes >> PAGE_SHIFT > 1;
  4171	}
  4172	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50434 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
  2021-07-15  5:49   ` kernel test robot
@ 2021-07-15  8:10   ` kernel test robot
  2021-07-15 17:05   ` kernel test robot
  2021-07-15 17:05   ` [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static kernel test robot
  3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2021-07-15  8:10 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: clang-built-linux, kbuild-all, Andrew Morton,
	Linux Memory Management List, Mike Kravetz, Axel Rasmussen,
	Miaohe Lin, Kirill A . Shutemov, Hugh Dickins, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 16403 bytes --]

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210715]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: powerpc-randconfig-r012-20210714 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 0e49c54a8cbd3e779e5526a5888c683c01cc3c50)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
        git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   __do_insb
   ^
   arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb'
   #define __do_insb(p, b, n)      readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:126:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)      readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:128:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)      readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:130:1: note: expanded from here
   __do_outsb
   ^
   arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb'
   #define __do_outsb(p, b, n)     writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:132:1: note: expanded from here
   __do_outsw
   ^
   arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw'
   #define __do_outsw(p, b, n)     writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from mm/hugetlb.c:11:
   In file included from include/linux/highmem.h:10:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:619:
   arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:134:1: note: expanded from here
   __do_outsl
   ^
   arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl'
   #define __do_outsl(p, b, n)     writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
>> mm/hugetlb.c:4334:6: warning: no previous prototype for function '__unmap_hugepage_range' [-Wmissing-prototypes]
   void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
        ^
   mm/hugetlb.c:4334:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
   ^
   static 
   7 warnings generated.


vim +/__unmap_hugepage_range +4334 mm/hugetlb.c

63551ae0feaaa2 David Gibson       2005-06-21  4333  
24669e58477e27 Aneesh Kumar K.V   2012-07-31 @4334  void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4335  			    unsigned long start, unsigned long end,
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4336  			    struct page *ref_page)
63551ae0feaaa2 David Gibson       2005-06-21  4337  {
63551ae0feaaa2 David Gibson       2005-06-21  4338  	struct mm_struct *mm = vma->vm_mm;
63551ae0feaaa2 David Gibson       2005-06-21  4339  	unsigned long address;
c7546f8f03f5a4 David Gibson       2005-08-05  4340  	pte_t *ptep;
63551ae0feaaa2 David Gibson       2005-06-21  4341  	pte_t pte;
cb900f41215447 Kirill A. Shutemov 2013-11-14  4342  	spinlock_t *ptl;
63551ae0feaaa2 David Gibson       2005-06-21  4343  	struct page *page;
a5516438959d90 Andi Kleen         2008-07-23  4344  	struct hstate *h = hstate_vma(vma);
a5516438959d90 Andi Kleen         2008-07-23  4345  	unsigned long sz = huge_page_size(h);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4346  	struct mmu_notifier_range range;
a5516438959d90 Andi Kleen         2008-07-23  4347  
63551ae0feaaa2 David Gibson       2005-06-21  4348  	WARN_ON(!is_vm_hugetlb_page(vma));
a5516438959d90 Andi Kleen         2008-07-23  4349  	BUG_ON(start & ~huge_page_mask(h));
a5516438959d90 Andi Kleen         2008-07-23  4350  	BUG_ON(end & ~huge_page_mask(h));
63551ae0feaaa2 David Gibson       2005-06-21  4351  
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4352  	/*
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4353  	 * This is a hugetlb vma, all the pte entries should point
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4354  	 * to huge page.
07e326610e5634 Aneesh Kumar K.V   2016-12-12  4355  	 */
ed6a79352cad00 Peter Zijlstra     2018-08-31  4356  	tlb_change_page_size(tlb, sz);
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4357  	tlb_start_vma(tlb, vma);
dff11abe280b47 Mike Kravetz       2018-10-05  4358  
dff11abe280b47 Mike Kravetz       2018-10-05  4359  	/*
dff11abe280b47 Mike Kravetz       2018-10-05  4360  	 * If sharing possible, alert mmu notifiers of worst case.
dff11abe280b47 Mike Kravetz       2018-10-05  4361  	 */
6f4f13e8d9e27c Jérôme Glisse      2019-05-13  4362  	mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
6f4f13e8d9e27c Jérôme Glisse      2019-05-13  4363  				end);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4364  	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4365  	mmu_notifier_invalidate_range_start(&range);
569f48b85813f0 Hillf Danton       2014-12-10  4366  	address = start;
569f48b85813f0 Hillf Danton       2014-12-10  4367  	for (; address < end; address += sz) {
7868a2087ec13e Punit Agrawal      2017-07-06  4368  		ptep = huge_pte_offset(mm, address, sz);
c7546f8f03f5a4 David Gibson       2005-08-05  4369  		if (!ptep)
c7546f8f03f5a4 David Gibson       2005-08-05  4370  			continue;
c7546f8f03f5a4 David Gibson       2005-08-05  4371  
cb900f41215447 Kirill A. Shutemov 2013-11-14  4372  		ptl = huge_pte_lock(h, mm, ptep);
34ae204f18519f Mike Kravetz       2020-08-11  4373  		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4374  			spin_unlock(ptl);
dff11abe280b47 Mike Kravetz       2018-10-05  4375  			/*
dff11abe280b47 Mike Kravetz       2018-10-05  4376  			 * We just unmapped a page of PMDs by clearing a PUD.
dff11abe280b47 Mike Kravetz       2018-10-05  4377  			 * The caller's TLB flush range should cover this area.
dff11abe280b47 Mike Kravetz       2018-10-05  4378  			 */
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4379  			continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4380  		}
39dde65c9940c9 Kenneth W Chen     2006-12-06  4381  
6629326b89b6e6 Hillf Danton       2012-03-23  4382  		pte = huge_ptep_get(ptep);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4383  		if (huge_pte_none(pte)) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4384  			spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4385  			continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4386  		}
6629326b89b6e6 Hillf Danton       2012-03-23  4387  
6629326b89b6e6 Hillf Danton       2012-03-23  4388  		/*
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4389  		 * Migrating hugepage or HWPoisoned hugepage is already
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4390  		 * unmapped and its refcount is dropped, so just clear pte here.
6629326b89b6e6 Hillf Danton       2012-03-23  4391  		 */
9fbc1f635fd0bd Naoya Horiguchi    2015-02-11  4392  		if (unlikely(!pte_present(pte))) {
9386fac34c7cbe Punit Agrawal      2017-07-06  4393  			huge_pte_clear(mm, address, ptep, sz);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4394  			spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4395  			continue;
8c4894c6bc790d Naoya Horiguchi    2012-12-12  4396  		}
6629326b89b6e6 Hillf Danton       2012-03-23  4397  
6629326b89b6e6 Hillf Danton       2012-03-23  4398  		page = pte_page(pte);
04f2cbe35699d2 Mel Gorman         2008-07-23  4399  		/*
04f2cbe35699d2 Mel Gorman         2008-07-23  4400  		 * If a reference page is supplied, it is because a specific
04f2cbe35699d2 Mel Gorman         2008-07-23  4401  		 * page is being unmapped, not a range. Ensure the page we
04f2cbe35699d2 Mel Gorman         2008-07-23  4402  		 * are about to unmap is the actual page of interest.
04f2cbe35699d2 Mel Gorman         2008-07-23  4403  		 */
04f2cbe35699d2 Mel Gorman         2008-07-23  4404  		if (ref_page) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4405  			if (page != ref_page) {
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4406  				spin_unlock(ptl);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4407  				continue;
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4408  			}
04f2cbe35699d2 Mel Gorman         2008-07-23  4409  			/*
04f2cbe35699d2 Mel Gorman         2008-07-23  4410  			 * Mark the VMA as having unmapped its page so that
04f2cbe35699d2 Mel Gorman         2008-07-23  4411  			 * future faults in this VMA will fail rather than
04f2cbe35699d2 Mel Gorman         2008-07-23  4412  			 * looking like data was lost
04f2cbe35699d2 Mel Gorman         2008-07-23  4413  			 */
04f2cbe35699d2 Mel Gorman         2008-07-23  4414  			set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED);
04f2cbe35699d2 Mel Gorman         2008-07-23  4415  		}
04f2cbe35699d2 Mel Gorman         2008-07-23  4416  
c7546f8f03f5a4 David Gibson       2005-08-05  4417  		pte = huge_ptep_get_and_clear(mm, address, ptep);
b528e4b6405b9f Aneesh Kumar K.V   2016-12-12  4418  		tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
106c992a5ebef2 Gerald Schaefer    2013-04-29  4419  		if (huge_pte_dirty(pte))
6649a3863232eb Ken Chen           2007-02-08  4420  			set_page_dirty(page);
9e81130b7ce230 Hillf Danton       2012-03-21  4421  
5d317b2b653659 Naoya Horiguchi    2015-11-05  4422  		hugetlb_count_sub(pages_per_huge_page(h), mm);
d281ee61451835 Kirill A. Shutemov 2016-01-15  4423  		page_remove_rmap(page, true);
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4424  
cb900f41215447 Kirill A. Shutemov 2013-11-14  4425  		spin_unlock(ptl);
e77b0852b551ff Aneesh Kumar K.V   2016-07-26  4426  		tlb_remove_page_size(tlb, page, huge_page_size(h));
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4427  		/*
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4428  		 * Bail out after unmapping reference page if supplied
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4429  		 */
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4430  		if (ref_page)
31d49da5ad0172 Aneesh Kumar K.V   2016-07-26  4431  			break;
fe1668ae5bf014 Kenneth W Chen     2006-10-04  4432  	}
ac46d4f3c43241 Jérôme Glisse      2018-12-28  4433  	mmu_notifier_invalidate_range_end(&range);
24669e58477e27 Aneesh Kumar K.V   2012-07-31  4434  	tlb_end_vma(tlb, vma);
^1da177e4c3f41 Linus Torvalds     2005-04-16  4435  }
63551ae0feaaa2 David Gibson       2005-06-21  4436  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32668 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
  2021-07-15  5:49   ` kernel test robot
  2021-07-15  8:10   ` kernel test robot
@ 2021-07-15 17:05   ` kernel test robot
  2021-07-15 17:05   ` [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static kernel test robot
  3 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2021-07-15 17:05 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: kbuild-all, Andrew Morton, Linux Memory Management List,
	Mike Kravetz, Axel Rasmussen, Miaohe Lin, Kirill A . Shutemov,
	Hugh Dickins, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

Hi Peter,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.14-rc1 next-20210715]
[cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
config: i386-randconfig-s002-20210714 (attached as .config)
compiler: gcc-10 (Debian 10.2.1-6) 10.2.1 20210110
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/f8dd355edbfe948f84c8aaa10a5173656aa2778c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
        git checkout f8dd355edbfe948f84c8aaa10a5173656aa2778c
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> mm/hugetlb.c:4334:6: sparse: sparse: symbol '__unmap_hugepage_range' was not declared. Should it be static?
   mm/hugetlb.c:444:12: sparse: sparse: context imbalance in 'allocate_file_region_entries' - wrong count at exit
   mm/hugetlb.c:517:13: sparse: sparse: context imbalance in 'region_add' - wrong count at exit
   mm/hugetlb.c:584:13: sparse: sparse: context imbalance in 'region_chg' - wrong count at exit
   mm/hugetlb.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/mm.h):
   include/linux/page-flags.h:183:29: sparse: sparse: context imbalance in 'hugetlb_cow' - unexpected unlock
   mm/hugetlb.c:5386:25: sparse: sparse: context imbalance in 'follow_hugetlb_page' - different lock contexts for basic block

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 46585 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static
  2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
                     ` (2 preceding siblings ...)
  2021-07-15 17:05   ` kernel test robot
@ 2021-07-15 17:05   ` kernel test robot
  2021-07-15 18:53     ` Peter Xu
  3 siblings, 1 reply; 34+ messages in thread
From: kernel test robot @ 2021-07-15 17:05 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: kbuild-all, Andrew Morton, Linux Memory Management List,
	Mike Kravetz, Axel Rasmussen, Miaohe Lin, Kirill A . Shutemov,
	Hugh Dickins, Jason Gunthorpe

mm/hugetlb.c:4334:6: warning: symbol '__unmap_hugepage_range' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 hugetlb.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 924553aa8f789ad..4bdd637b0c29a95 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4331,9 +4331,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	return ret;
 }
 
-void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			    unsigned long start, unsigned long end,
-			    struct page *ref_page)
+static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+				   unsigned long start, unsigned long end,
+				   struct page *ref_page)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler
  2021-07-15  6:20   ` kernel test robot
@ 2021-07-15 18:50     ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-15 18:50 UTC (permalink / raw)
  To: kernel test robot
  Cc: linux-kernel, linux-mm, clang-built-linux, kbuild-all,
	Jason Gunthorpe, Matthew Wilcox, Andrew Morton, Axel Rasmussen,
	Nadav Amit, Jerome Glisse

On Thu, Jul 15, 2021 at 02:20:21PM +0800, kernel test robot wrote:
> Hi Peter,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v5.14-rc1 next-20210714]
> [cannot apply to hnaz-linux-mm/master asm-generic/master linux/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8096acd7442e613fad0354fc8dfdb2003cceea0b
> config: x86_64-randconfig-r021-20210714 (attached as .config)
> compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 0e49c54a8cbd3e779e5526a5888c683c01cc3c50)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install x86_64 cross compiling tool for clang build
>         # apt-get install binutils-x86-64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/8b3e70f40b8333a803e642ed5c8cce738985881c
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210715-062718
>         git checkout 8b3e70f40b8333a803e642ed5c8cce738985881c
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> mm/memory.c:4167:6: error: implicit declaration of function 'uffd_disable_fault_around' [-Werror,-Wimplicit-function-declaration]
>            if (uffd_disable_fault_around(vmf->vma))
>                ^
>    1 error generated.
> 
> 
> vim +/uffd_disable_fault_around +4167 mm/memory.c
> 
>   4159	
>   4160	/* Return true if we should do read fault-around, false otherwise */
>   4161	static inline bool should_fault_around(struct vm_fault *vmf)
>   4162	{
>   4163		/* No ->map_pages?  No way to fault around... */
>   4164		if (!vmf->vma->vm_ops->map_pages)
>   4165			return false;
>   4166	
> > 4167		if (uffd_disable_fault_around(vmf->vma))
>   4168			return false;
>   4169	
>   4170		return fault_around_bytes >> PAGE_SHIFT > 1;
>   4171	}
>   4172	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

Will squash this into the patch:

---8<---
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index c4228acd1d80..4382240de7c3 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -283,6 +283,11 @@ static inline bool pte_swp_uffd_wp_special(pte_t pte)
        return false;
 }
 
+static inline bool uffd_disable_fault_around(struct vm_area_struct *vma)
+{
+       return false;
+}
+
 #endif /* CONFIG_USERFAULTFD */
 
 #endif /* _LINUX_USERFAULTFD_K_H */
---8<---

Thanks,

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static
  2021-07-15 17:05   ` [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static kernel test robot
@ 2021-07-15 18:53     ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2021-07-15 18:53 UTC (permalink / raw)
  To: kernel test robot
  Cc: linux-mm, linux-kernel, kbuild-all, Andrew Morton, Mike Kravetz,
	Axel Rasmussen, Miaohe Lin, Kirill A . Shutemov, Hugh Dickins,
	Jason Gunthorpe

On Fri, Jul 16, 2021 at 01:05:24AM +0800, kernel test robot wrote:
> mm/hugetlb.c:4334:6: warning: symbol '__unmap_hugepage_range' was not declared. Should it be static?
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: kernel test robot <lkp@intel.com>
> ---
>  hugetlb.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 924553aa8f789ad..4bdd637b0c29a95 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4331,9 +4331,9 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
>  	return ret;
>  }
>  
> -void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> -			    unsigned long start, unsigned long end,
> -			    struct page *ref_page)
> +static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> +				   unsigned long start, unsigned long end,
> +				   struct page *ref_page)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	unsigned long address;

Will squash this change into the patch.  Thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-07-15 19:02 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 22:20 [PATCH v4 00/26] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2021-07-14 22:20 ` [PATCH v4 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-07-14 22:20 ` [PATCH v4 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-14 22:20 ` [PATCH v4 03/26] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-07-14 22:20 ` [PATCH v4 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Peter Xu
2021-07-14 22:20 ` [PATCH v4 05/26] mm/swap: Introduce the idea of special swap ptes Peter Xu
2021-07-14 22:20 ` [PATCH v4 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Peter Xu
2021-07-15  6:20   ` kernel test robot
2021-07-15 18:50     ` Peter Xu
2021-07-14 22:20 ` [PATCH v4 07/26] mm: Drop first_index/last_index in zap_details Peter Xu
2021-07-14 22:20 ` [PATCH v4 08/26] mm: Introduce zap_details.zap_flags Peter Xu
2021-07-14 22:21 ` [PATCH v4 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP Peter Xu
2021-07-14 22:21 ` [PATCH v4 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Peter Xu
2021-07-14 22:24 ` [PATCH v4 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Peter Xu
2021-07-14 22:24 ` [PATCH v4 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2021-07-14 22:24 ` [PATCH v4 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes Peter Xu
2021-07-14 22:24 ` [PATCH v4 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Peter Xu
2021-07-14 22:24 ` [PATCH v4 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h Peter Xu
2021-07-15  5:49   ` kernel test robot
2021-07-15  8:10   ` kernel test robot
2021-07-15 17:05   ` kernel test robot
2021-07-15 17:05   ` [RFC PATCH] mm/hugetlb: __unmap_hugepage_range() can be static kernel test robot
2021-07-15 18:53     ` Peter Xu
2021-07-14 22:24 ` [PATCH v4 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2021-07-14 22:24 ` [PATCH v4 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection Peter Xu
2021-07-14 22:25 ` [PATCH v4 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2021-07-14 22:25 ` [PATCH v4 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Peter Xu
2021-07-14 22:25 ` [PATCH v4 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers Peter Xu
2021-07-14 22:25 ` [PATCH v4 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Peter Xu
2021-07-14 22:25 ` [PATCH v4 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes Peter Xu
2021-07-14 22:25 ` [PATCH v4 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Peter Xu
2021-07-14 22:25 ` [PATCH v4 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2021-07-14 22:25 ` [PATCH v4 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Peter Xu
2021-07-14 22:25 ` [PATCH v4 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).