linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] userfaultfd: add minor fault handling
@ 2021-01-22 21:29 Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

Changelog
=========

v1->v2:
- Fixed a bug in the hugetlb_mcopy_atomic_pte retry case. We now plumb in the
  enum mcopy_atomic_mode, so we can differentiate between the three cases this
  function needs to handle:
  1) We're doing a COPY op, and need to allocate a page, add to cache, etc.
  2) We're doing a COPY op, but allocation in this function failed previously;
     we're in the retry path. The page was allocated, but not e.g. added to page
     cache, so that still needs to be done.
  3) We're doing a CONTINUE op, we need to look up an existing page instead of
     allocating a new one.
- Rebased onto a newer version of Peter's patches to disable huge PMD sharing,
  which fixes syzbot complaints on some non-x86 architectures.
- Moved __VM_UFFD_FLAGS into userfaultfd_k.h, so inline helpers can use it.
- Renamed UFFD_FEATURE_MINOR_FAULT_HUGETLBFS to UFFD_FEATURE_MINOR_HUGETLBFS,
  for consistency with other existing feature flags.
- Moved the userfaultfd_minor hook in hugetlb.c into the else block, so we don't
  have to explicitly check for !new_page.

RFC->v1:
- Rebased onto Peter Xu's patches for disabling huge PMD sharing for certain
  userfaultfd-registered areas.
- Added commits which update documentation, and add a self test which exercises
  the new feature.
- Fixed reporting CONTINUE as a supported ioctl even for non-MINOR ranges.

Overview
========

This series adds a new userfaultfd registration mode,
UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults.
By "minor" fault, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory).
One of the mappings is registered with userfaultfd (in minor mode), and the
other is not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what I'm
calling a "minor" fault. As a concrete example, when working with hugetlbfs, we
have huge_pte_none(), but find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is,
userspace resolves the fault by either a) doing nothing if the contents are
already correct, or b) updating the underlying contents using the second,
non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA,
or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel
"I have ensured the page contents are correct, carry on setting up the mapping".

Use Case
========

Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
   target machine. The pages are populated on the target by writing to the
   non-UFFD mapping, using the setup described above. The VM is still running
   (and therefore its memory is likely changing), so this may be repeated
   several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
   During this gap, the VM's user(s) will *see* a pause, so it is desirable to
   minimize this window.

3. Between the last time any page was copied from the source to the target, and
   when the VM was paused, the contents of that page may have changed - and
   therefore the copy we have on the target machine is out of date. Although we
   can keep track of which pages are out of date, for VMs with large amounts of
   memory, it is "slow" to transfer this information to the target machine. We
   want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
   touches its memory (via the UFFD-registered mapping), userspace wants to
   intercept this fault. Userspace checks whether or not the page is up to date,
   and if not, copies the updated page from the source machine, via the non-UFFD
   mapping. Finally, whether a copy was performed or not, userspace issues a
   UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
   are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==============================

Because it's possible to combine registration modes (e.g. a single VMA can be
userfaultfd-registered MINOR | MISSING), and because it's up to userspace how to
resolve faults once they are received, I spent some time thinking through how
the existing API interacts with the new feature.

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without
modifications, the existing codepath assumes a new page needs to be allocated.
This is okay, since userspace must have a second non-UFFD-registered mapping
anyway, thus there isn't much reason to want to use these in any case (just
memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
  in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
  -ENOENT in that case (regardless of the kind of fault).

Dependencies
============

I've included 4 commits from Peter Xu's larger series
(https://lore.kernel.org/patchwork/cover/1366017/) in this series. My changes
depend on his work, to disable huge PMD sharing for MINOR registered userfaultfd
areas. I included the 4 commits directly because a) it lets this series just be
applied and work as-is, and b) they are fairly standalone, and could potentially
be merged even without the rest of the larger series Peter submitted. Thanks
Peter!

Also, although it doesn't affect minor fault handling, I did notice that the
userfaultfd self test sometimes experienced memory corruption
(https://lore.kernel.org/patchwork/cover/1356755/). For anyone testing this
series, it may be useful to apply that series first to fix the selftest
flakiness. That series doesn't have to be merged into mainline / maintaner
branches before mine, though.

Future Work
===========

Currently the patchset only supports hugetlbfs. There is no reason it can't work
with shmem, but I expect hugetlbfs to be much more commonly used since we're
talking about backing guest memory for VMs. I plan to implement shmem support in
a follow-up patch series.

Axel Rasmussen (5):
  userfaultfd: add minor fault registration mode
  userfaultfd: disable huge PMD sharing for MINOR registered VMAs
  userfaultfd: add UFFDIO_CONTINUE ioctl
  userfaultfd: update documentation to describe minor fault handling
  userfaultfd/selftests: add test exercising minor fault handling

Peter Xu (4):
  hugetlb: Pass vma into huge_pte_alloc()
  hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled
  mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h
  hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

 Documentation/admin-guide/mm/userfaultfd.rst | 105 ++++++----
 arch/arm64/mm/hugetlbpage.c                  |   5 +-
 arch/ia64/mm/hugetlbpage.c                   |   3 +-
 arch/mips/mm/hugetlbpage.c                   |   4 +-
 arch/parisc/mm/hugetlbpage.c                 |   2 +-
 arch/powerpc/mm/hugetlbpage.c                |   3 +-
 arch/s390/mm/hugetlbpage.c                   |   2 +-
 arch/sh/mm/hugetlbpage.c                     |   2 +-
 arch/sparc/mm/hugetlbpage.c                  |   2 +-
 fs/proc/task_mmu.c                           |   1 +
 fs/userfaultfd.c                             | 190 ++++++++++++++++---
 include/linux/hugetlb.h                      |  26 ++-
 include/linux/mm.h                           |   1 +
 include/linux/mmu_notifier.h                 |   1 +
 include/linux/userfaultfd_k.h                |  48 ++++-
 include/trace/events/mmflags.h               |   1 +
 include/uapi/linux/userfaultfd.h             |  36 +++-
 mm/hugetlb.c                                 |  75 +++++---
 mm/userfaultfd.c                             |  71 ++++---
 tools/testing/selftests/vm/userfaultfd.c     | 147 +++++++++++++-
 20 files changed, 585 insertions(+), 140 deletions(-)

--
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/9] hugetlb: Pass vma into huge_pte_alloc()
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

From: Peter Xu <peterx@redhat.com>

It is a preparation work to be able to behave differently in the per
architecture huge_pte_alloc() according to different VMA attributes.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 arch/arm64/mm/hugetlbpage.c   | 2 +-
 arch/ia64/mm/hugetlbpage.c    | 3 ++-
 arch/mips/mm/hugetlbpage.c    | 4 ++--
 arch/parisc/mm/hugetlbpage.c  | 2 +-
 arch/powerpc/mm/hugetlbpage.c | 3 ++-
 arch/s390/mm/hugetlbpage.c    | 2 +-
 arch/sh/mm/hugetlbpage.c      | 2 +-
 arch/sparc/mm/hugetlbpage.c   | 2 +-
 include/linux/hugetlb.h       | 2 +-
 mm/hugetlb.c                  | 6 +++---
 mm/userfaultfd.c              | 2 +-
 11 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 55ecf6de9ff7..5b32ec888698 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 		set_pte(ptep, pte);
 }
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgdp;
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index b331f94d20ac..f993cb36c062 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT;
 EXPORT_SYMBOL(hpage_shift);
 
 pte_t *
-huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
+huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
+	       unsigned long addr, unsigned long sz)
 {
 	unsigned long taddr = htlbpage_to_page(addr);
 	pgd_t *pgd;
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index b9f76f433617..871a100fb361 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -21,8 +21,8 @@
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
-		      unsigned long sz)
+pte_t *huge_pte_alloc(struct mm_struct *mm, structt vm_area_struct *vma,
+		      unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index d7ba014a7fbb..e141441bfa64 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 }
 
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgd;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8b3cc4d688e8..d57276b8791c 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
  * At this point we do the placement change only for BOOK3S 64. This would
  * possibly work on other subarchs.
  */
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
+		      unsigned long addr, unsigned long sz)
 {
 	pgd_t *pg;
 	p4d_t *p4;
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 3b5a4d25ca9b..da36d13ffc16 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgdp;
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index 220d7bc43d2b..999ab5916e69 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -21,7 +21,7 @@
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgd;
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index ad4b42f04988..04d8790f6c32 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -279,7 +279,7 @@ unsigned long pud_leaf_size(pud_t pud) { return 1UL << tte_to_shift(*(pte_t *)&p
 unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&pmd); }
 unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); }
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgd;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..1e0abb609976 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -161,7 +161,7 @@ extern struct list_head huge_boot_pages;
 
 /* arch callbacks */
 
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz);
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, unsigned long sz);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 18f6ee317900..07b23c81b1db 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3766,7 +3766,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		src_pte = huge_pte_offset(src, addr, sz);
 		if (!src_pte)
 			continue;
-		dst_pte = huge_pte_alloc(dst, addr, sz);
+		dst_pte = huge_pte_alloc(dst, vma, addr, sz);
 		if (!dst_pte) {
 			ret = -ENOMEM;
 			break;
@@ -4503,7 +4503,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 */
 	mapping = vma->vm_file->f_mapping;
 	i_mmap_lock_read(mapping);
-	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
+	ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h));
 	if (!ptep) {
 		i_mmap_unlock_read(mapping);
 		return VM_FAULT_OOM;
@@ -5392,7 +5392,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
 #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
 
 #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
-pte_t *huge_pte_alloc(struct mm_struct *mm,
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz)
 {
 	pgd_t *pgd;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 7423808640ef..b2ce61c1b50d 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 		err = -ENOMEM;
-		dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize);
+		dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize);
 		if (!dst_pte) {
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			i_mmap_unlock_read(mapping);
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

From: Peter Xu <peterx@redhat.com>

Huge pmd sharing could bring problem to userfaultfd.  The thing is that
userfaultfd is running its logic based on the special bits on page table
entries, however the huge pmd sharing could potentially share page table
entries for different address ranges.  That could cause issues on either:

  - When sharing huge pmd page tables for an uffd write protected range, the
    newly mapped huge pmd range will also be write protected unexpectedly, or,

  - When we try to write protect a range of huge pmd shared range, we'll first
    do huge_pmd_unshare() in hugetlb_change_protection(), however that also
    means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
    region, which could lead to data loss.

Since at it, a few other things are done altogether:

  - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
    that's definitely something that arch code would like to use too

  - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
    trying to share huge pmd.  Switch to the want_pmd_share() helper.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 arch/arm64/mm/hugetlbpage.c   |  3 +--
 include/linux/hugetlb.h       | 15 +++++++++++++++
 include/linux/userfaultfd_k.h |  9 +++++++++
 mm/hugetlb.c                  |  5 ++---
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 5b32ec888698..1a8ce0facfe8 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		ptep = pte_alloc_map(mm, pmdp, addr);
 	} else if (sz == PMD_SIZE) {
-		if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
-		    pud_none(READ_ONCE(*pudp)))
+		if (want_pmd_share(vma) && pud_none(READ_ONCE(*pudp)))
 			ptep = huge_pmd_share(mm, addr, pudp);
 		else
 			ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 1e0abb609976..4508136c8376 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -11,6 +11,7 @@
 #include <linux/kref.h>
 #include <linux/pgtable.h>
 #include <linux/gfp.h>
+#include <linux/userfaultfd_k.h>
 
 struct ctl_table;
 struct user_struct;
@@ -947,4 +948,18 @@ static inline __init void hugetlb_cma_check(void)
 }
 #endif
 
+static inline bool want_pmd_share(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_USERFAULTFD
+	if (uffd_disable_huge_pmd_share(vma))
+		return false;
+#endif
+
+#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
+	return true;
+#else
+	return false;
+#endif
+}
+
 #endif /* _LINUX_HUGETLB_H */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index a8e5f3ea9bb2..c63ccdae3eab 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
 	return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx;
 }
 
+/*
+ * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp
+ * protect information is per pgtable entry.
+ */
+static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_WP;
+}
+
 static inline bool userfaultfd_missing(struct vm_area_struct *vma)
 {
 	return vma->vm_flags & VM_UFFD_MISSING;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07b23c81b1db..d46f50a99ff1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5371,7 +5371,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
 	*addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE;
 	return 1;
 }
-#define want_pmd_share()	(1)
+
 #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
 pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 {
@@ -5388,7 +5388,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
 				unsigned long *start, unsigned long *end)
 {
 }
-#define want_pmd_share()	(0)
 #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
 
 #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
@@ -5410,7 +5409,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			pte = (pte_t *)pud;
 		} else {
 			BUG_ON(sz != PMD_SIZE);
-			if (want_pmd_share() && pud_none(*pud))
+			if (want_pmd_share(vma) && pud_none(*pud))
 				pte = huge_pmd_share(mm, addr, pud);
 			else
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

From: Peter Xu <peterx@redhat.com>

Prepare for it to be called outside of mm/hugetlb.c.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/hugetlb.h | 8 ++++++++
 mm/hugetlb.c            | 8 --------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 4508136c8376..f94a35296618 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -962,4 +962,12 @@ static inline bool want_pmd_share(struct vm_area_struct *vma)
 #endif
 }
 
+#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE
+/*
+ * ARCHes with special requirements for evicting HUGETLB backing TLB entries can
+ * implement this.
+ */
+#define flush_hugetlb_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
+#endif
+
 #endif /* _LINUX_HUGETLB_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d46f50a99ff1..30a087dda57d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4924,14 +4924,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	return i ? i : err;
 }
 
-#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE
-/*
- * ARCHes with special requirements for evicting HUGETLB backing TLB entries can
- * implement this.
- */
-#define flush_hugetlb_tlb_range(vma, addr, end)	flush_tlb_range(vma, addr, end)
-#endif
-
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 		unsigned long address, unsigned long end, pgprot_t newprot)
 {
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (2 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 5/9] userfaultfd: add minor fault registration mode Axel Rasmussen
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

From: Peter Xu <peterx@redhat.com>

Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
userfaultfd-wp is always based on pgtable entries, so they cannot be shared.

Walk the hugetlb range and unshare all such mappings if there is, right before
UFFDIO_REGISTER will succeed and return to userspace.

This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing
is completely disabled for userfaultfd-wp registered range.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/userfaultfd.c             | 45 ++++++++++++++++++++++++++++++++++++
 include/linux/mmu_notifier.h |  1 +
 2 files changed, 46 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 894cc28142e7..2c6706ac2504 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -15,6 +15,7 @@
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
 #include <linux/mm.h>
+#include <linux/mmu_notifier.h>
 #include <linux/poll.h>
 #include <linux/slab.h>
 #include <linux/seq_file.h>
@@ -1190,6 +1191,47 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
 	}
 }
 
+/*
+ * This function will unconditionally remove all the shared pmd pgtable entries
+ * within the specific vma for a hugetlbfs memory range.
+ */
+static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	struct hstate *h = hstate_vma(vma);
+	unsigned long sz = huge_page_size(h);
+	struct mm_struct *mm = vma->vm_mm;
+	struct mmu_notifier_range range;
+	unsigned long address;
+	spinlock_t *ptl;
+	pte_t *ptep;
+
+	/*
+	 * No need to call adjust_range_if_pmd_sharing_possible(), because
+	 * we're going to operate on the whole vma
+	 */
+	mmu_notifier_range_init(&range, MMU_NOTIFY_HUGETLB_UNSHARE,
+				0, vma, mm, vma->vm_start, vma->vm_end);
+	mmu_notifier_invalidate_range_start(&range);
+	i_mmap_lock_write(vma->vm_file->f_mapping);
+	for (address = vma->vm_start; address < vma->vm_end; address += sz) {
+		ptep = huge_pte_offset(mm, address, sz);
+		if (!ptep)
+			continue;
+		ptl = huge_pte_lock(h, mm, ptep);
+		huge_pmd_unshare(mm, vma, &address, ptep);
+		spin_unlock(ptl);
+	}
+	flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end);
+	i_mmap_unlock_write(vma->vm_file->f_mapping);
+	/*
+	 * No need to call mmu_notifier_invalidate_range(), see
+	 * Documentation/vm/mmu_notifier.rst.
+	 */
+	mmu_notifier_invalidate_range_end(&range);
+#endif
+}
+
 static void __wake_userfault(struct userfaultfd_ctx *ctx,
 			     struct userfaultfd_wake_range *range)
 {
@@ -1448,6 +1490,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vma->vm_flags = new_flags;
 		vma->vm_userfaultfd_ctx.ctx = ctx;
 
+		if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
+			hugetlb_unshare_all_pmds(vma);
+
 	skip:
 		prev = vma;
 		start = vma->vm_end;
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b8200782dede..ff50c8528113 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -51,6 +51,7 @@ enum mmu_notifier_event {
 	MMU_NOTIFY_SOFT_DIRTY,
 	MMU_NOTIFY_RELEASE,
 	MMU_NOTIFY_MIGRATE,
+	MMU_NOTIFY_HUGETLB_UNSHARE,
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/9] userfaultfd: add minor fault registration mode
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (3 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

This commit adds the new registration mode, and sets the relevant flag
on the VMAs being registered. In the hugetlb fault path, if we find
that we have huge_pte_none(), but find_lock_page() does indeed find an
existing page, then we have a "minor" fault, and if the VMA has the
userfaultfd registration flag, we call into userfaultfd to handle it.

Why add a new registration mode, as opposed to adding a feature to
MISSING registration, like UFFD_FEATURE_SIGBUS?

- The semantics are significantly different. UFFDIO_COPY or
  UFFDIO_ZEROPAGE do not make sense for these minor faults; userspace
  would instead just memset() or memcpy() or whatever via the non-UFFD
  mapping. Unlike MISSING registration, MINOR registration only makes
  sense for shared memory (hugetlbfs or shmem [to be supported in future
  commits]).

- Doing so would make handle_userfault()'s "reason" argument confusing.
  We'd pass in "MISSING" even if the pages weren't really missing.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/proc/task_mmu.c               |  1 +
 fs/userfaultfd.c                 | 78 +++++++++++++++++++-------------
 include/linux/mm.h               |  1 +
 include/linux/userfaultfd_k.h    | 15 +++++-
 include/trace/events/mmflags.h   |  1 +
 include/uapi/linux/userfaultfd.h | 15 +++++-
 mm/hugetlb.c                     | 32 +++++++++++++
 7 files changed, 109 insertions(+), 34 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 602e3a52884d..94e951ea3e03 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 		[ilog2(VM_MTE)]		= "mt",
 		[ilog2(VM_MTE_ALLOWED)]	= "",
 #endif
+		[ilog2(VM_UFFD_MINOR)]	= "ui",
 #ifdef CONFIG_ARCH_HAS_PKEYS
 		/* These come out via ProtectionKey: */
 		[ilog2(VM_PKEY_BIT0)]	= "",
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 2c6706ac2504..968aca3e3ee9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -197,24 +197,21 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
 	msg_init(&msg);
 	msg.event = UFFD_EVENT_PAGEFAULT;
 	msg.arg.pagefault.address = address;
+	/*
+	 * These flags indicate why the userfault occurred:
+	 * - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault.
+	 * - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault.
+	 * - Neither of these flags being set indicates a MISSING fault.
+	 *
+	 * Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write
+	 * fault. Otherwise, it was a read fault.
+	 */
 	if (flags & FAULT_FLAG_WRITE)
-		/*
-		 * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-		 * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE
-		 * was not set in a UFFD_EVENT_PAGEFAULT, it means it
-		 * was a read fault, otherwise if set it means it's
-		 * a write fault.
-		 */
 		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE;
 	if (reason & VM_UFFD_WP)
-		/*
-		 * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-		 * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was
-		 * not set in a UFFD_EVENT_PAGEFAULT, it means it was
-		 * a missing fault, otherwise if set it means it's a
-		 * write protect fault.
-		 */
 		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
+	if (reason & VM_UFFD_MINOR)
+		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR;
 	if (features & UFFD_FEATURE_THREAD_ID)
 		msg.arg.pagefault.feat.ptid = task_pid_vnr(current);
 	return msg;
@@ -401,8 +398,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	BUG_ON(ctx->mm != mm);
 
-	VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
-	VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
+	/* Any unrecognized flag is a bug. */
+	VM_BUG_ON(reason & ~__VM_UFFD_FLAGS);
+	/* 0 or > 1 flags set is a bug; we expect exactly 1. */
+	VM_BUG_ON(!reason || !!(reason & (reason - 1)));
 
 	if (ctx->features & UFFD_FEATURE_SIGBUS)
 		goto out;
@@ -612,7 +611,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx,
 		for (vma = mm->mmap; vma; vma = vma->vm_next)
 			if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) {
 				vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
-				vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
+				vma->vm_flags &= ~__VM_UFFD_FLAGS;
 			}
 		mmap_write_unlock(mm);
 
@@ -644,7 +643,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
 	octx = vma->vm_userfaultfd_ctx.ctx;
 	if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) {
 		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
-		vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
+		vma->vm_flags &= ~__VM_UFFD_FLAGS;
 		return 0;
 	}
 
@@ -726,7 +725,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma,
 	} else {
 		/* Drop uffd context if remap feature not enabled */
 		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
-		vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
+		vma->vm_flags &= ~__VM_UFFD_FLAGS;
 	}
 }
 
@@ -867,12 +866,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		cond_resched();
 		BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^
-		       !!(vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
+		       !!(vma->vm_flags & __VM_UFFD_FLAGS));
 		if (vma->vm_userfaultfd_ctx.ctx != ctx) {
 			prev = vma;
 			continue;
 		}
-		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
+		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
 		prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
 				 new_flags, vma->anon_vma,
 				 vma->vm_file, vma->vm_pgoff,
@@ -1302,9 +1301,26 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
 				     unsigned long vm_flags)
 {
 	/* FIXME: add WP support to hugetlbfs and shmem */
-	return vma_is_anonymous(vma) ||
-		((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) &&
-		 !(vm_flags & VM_UFFD_WP));
+	if (vm_flags & VM_UFFD_WP) {
+		if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma))
+			return false;
+	}
+
+	if (vm_flags & VM_UFFD_MINOR) {
+		/*
+		 * The use case for minor registration (intercepting minor
+		 * faults) is to handle the case where a page is present, but
+		 * needs to be modified before it can be used. This requires
+		 * two mappings: one with UFFD registration, and one without.
+		 * So, it only makes sense to do this with shared memory.
+		 */
+		/* FIXME: Add minor fault interception for shmem. */
+		if (!(is_vm_hugetlb_page(vma) && (vma->vm_flags & VM_SHARED)))
+			return false;
+	}
+
+	return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
+	       vma_is_shmem(vma);
 }
 
 static int userfaultfd_register(struct userfaultfd_ctx *ctx,
@@ -1330,14 +1346,15 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	ret = -EINVAL;
 	if (!uffdio_register.mode)
 		goto out;
-	if (uffdio_register.mode & ~(UFFDIO_REGISTER_MODE_MISSING|
-				     UFFDIO_REGISTER_MODE_WP))
+	if (uffdio_register.mode & ~UFFD_API_REGISTER_MODES)
 		goto out;
 	vm_flags = 0;
 	if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
 		vm_flags |= VM_UFFD_MISSING;
 	if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)
 		vm_flags |= VM_UFFD_WP;
+	if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)
+		vm_flags |= VM_UFFD_MINOR;
 
 	ret = validate_range(mm, &uffdio_register.range.start,
 			     uffdio_register.range.len);
@@ -1381,7 +1398,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		cond_resched();
 
 		BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
-		       !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
+		       !!(cur->vm_flags & __VM_UFFD_FLAGS));
 
 		/* check not compatible vmas */
 		ret = -EINVAL;
@@ -1461,8 +1478,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 			start = vma->vm_start;
 		vma_end = min(end, vma->vm_end);
 
-		new_flags = (vma->vm_flags &
-			     ~(VM_UFFD_MISSING|VM_UFFD_WP)) | vm_flags;
+		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
@@ -1584,7 +1600,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 		cond_resched();
 
 		BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^
-		       !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP)));
+		       !!(cur->vm_flags & __VM_UFFD_FLAGS));
 
 		/*
 		 * Check not compatible vmas, not strictly required
@@ -1635,7 +1651,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
 		}
 
-		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
+		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
 				 vma_policy(vma),
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8cd6ae..1d7041bd3148 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -276,6 +276,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
 #define VM_DENYWRITE	0x00000800	/* ETXTBSY on write attempts.. */
 #define VM_UFFD_WP	0x00001000	/* wrprotect pages tracking */
+#define VM_UFFD_MINOR	0x00002000	/* minor fault interception */
 
 #define VM_LOCKED	0x00002000
 #define VM_IO           0x00004000	/* Memory mapped I/O or similar */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index c63ccdae3eab..0390e5ac63b3 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -17,6 +17,9 @@
 #include <linux/mm.h>
 #include <asm-generic/pgtable_uffd.h>
 
+/* The set of all possible UFFD-related VM flags. */
+#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
+
 /*
  * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
  * new flags, since they might collide with O_* ones. We want
@@ -71,6 +74,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
 	return vma->vm_flags & VM_UFFD_WP;
 }
 
+static inline bool userfaultfd_minor(struct vm_area_struct *vma)
+{
+	return vma->vm_flags & VM_UFFD_MINOR;
+}
+
 static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
 				      pte_t pte)
 {
@@ -85,7 +93,7 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma,
 
 static inline bool userfaultfd_armed(struct vm_area_struct *vma)
 {
-	return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP);
+	return vma->vm_flags & __VM_UFFD_FLAGS;
 }
 
 extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
@@ -132,6 +140,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma)
 	return false;
 }
 
+static inline bool userfaultfd_minor(struct vm_area_struct *vma)
+{
+	return false;
+}
+
 static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
 				      pte_t pte)
 {
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 67018d367b9f..2d583ffd4100 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -151,6 +151,7 @@ IF_HAVE_PG_ARCH_2(PG_arch_2,		"arch_2"	)
 	{VM_PFNMAP,			"pfnmap"	},		\
 	{VM_DENYWRITE,			"denywrite"	},		\
 	{VM_UFFD_WP,			"uffd_wp"	},		\
+	{VM_UFFD_MINOR,			"uffd_minor"	},		\
 	{VM_LOCKED,			"locked"	},		\
 	{VM_IO,				"io"		},		\
 	{VM_SEQ_READ,			"seqread"	},		\
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 5f2d88212f7c..f24dd4fcbad9 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -19,15 +19,19 @@
  * means the userland is reading).
  */
 #define UFFD_API ((__u64)0xAA)
+#define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING |	\
+				 UFFDIO_REGISTER_MODE_WP |	\
+				 UFFDIO_REGISTER_MODE_MINOR)
 #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP |	\
 			   UFFD_FEATURE_EVENT_FORK |		\
 			   UFFD_FEATURE_EVENT_REMAP |		\
-			   UFFD_FEATURE_EVENT_REMOVE |	\
+			   UFFD_FEATURE_EVENT_REMOVE |		\
 			   UFFD_FEATURE_EVENT_UNMAP |		\
 			   UFFD_FEATURE_MISSING_HUGETLBFS |	\
 			   UFFD_FEATURE_MISSING_SHMEM |		\
 			   UFFD_FEATURE_SIGBUS |		\
-			   UFFD_FEATURE_THREAD_ID)
+			   UFFD_FEATURE_THREAD_ID |		\
+			   UFFD_FEATURE_MINOR_HUGETLBFS)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -127,6 +131,7 @@ struct uffd_msg {
 /* flags for UFFD_EVENT_PAGEFAULT */
 #define UFFD_PAGEFAULT_FLAG_WRITE	(1<<0)	/* If this was a write fault */
 #define UFFD_PAGEFAULT_FLAG_WP		(1<<1)	/* If reason is VM_UFFD_WP */
+#define UFFD_PAGEFAULT_FLAG_MINOR	(1<<2)	/* If reason is VM_UFFD_MINOR */
 
 struct uffdio_api {
 	/* userland asks for an API number and the features to enable */
@@ -171,6 +176,10 @@ struct uffdio_api {
 	 *
 	 * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will
 	 * be returned, if feature is not requested 0 will be returned.
+	 *
+	 * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
+	 * can be intercepted (via REGISTER_MODE_MINOR) for
+	 * hugetlbfs-backed pages.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -181,6 +190,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
 #define UFFD_FEATURE_SIGBUS			(1<<7)
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
+#define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
 	__u64 features;
 
 	__u64 ioctls;
@@ -195,6 +205,7 @@ struct uffdio_register {
 	struct uffdio_range range;
 #define UFFDIO_REGISTER_MODE_MISSING	((__u64)1<<0)
 #define UFFDIO_REGISTER_MODE_WP		((__u64)1<<1)
+#define UFFDIO_REGISTER_MODE_MINOR	((__u64)1<<2)
 	__u64 mode;
 
 	/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 30a087dda57d..6f9d8349f818 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4375,6 +4375,38 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 				VM_FAULT_SET_HINDEX(hstate_index(h));
 			goto backout_unlocked;
 		}
+
+		/* Check for page in userfault range. */
+		if (userfaultfd_minor(vma)) {
+			u32 hash;
+			struct vm_fault vmf = {
+				.vma = vma,
+				.address = haddr,
+				.flags = flags,
+				/*
+				 * Hard to debug if it ends up being used by a
+				 * callee that assumes something about the
+				 * other uninitialized fields... same as in
+				 * memory.c
+				 */
+			};
+
+			unlock_page(page);
+
+			/*
+			 * hugetlb_fault_mutex and i_mmap_rwsem must be dropped
+			 * before handling userfault.  Reacquire after handling
+			 * fault to make calling code simpler.
+			 */
+
+			hash = hugetlb_fault_mutex_hash(mapping, idx);
+			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+			i_mmap_unlock_read(mapping);
+			ret = handle_userfault(&vmf, VM_UFFD_MINOR);
+			i_mmap_lock_read(mapping);
+			mutex_lock(&hugetlb_fault_mutex_table[hash]);
+			goto out;
+		}
 	}
 
 	/*
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (4 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 5/9] userfaultfd: add minor fault registration mode Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

As the comment says: for the MINOR fault use case, although the page
might be present and populated in the other (non-UFFD-registered) half
of the shared mapping, it may be out of date, and we explicitly want
userspace to get a minor fault so it can check and potentially update
the page's contents.

Huge PMD sharing would prevent these faults from occurring for
suitably aligned areas, so disable it upon UFFD registration.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/userfaultfd_k.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 0390e5ac63b3..fb9abaeb4194 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -56,12 +56,18 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
 }
 
 /*
- * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp
- * protect information is per pgtable entry.
+ * Never enable huge pmd sharing on some uffd registered vmas:
+ *
+ * - VM_UFFD_WP VMAs, because write protect information is per pgtable entry.
+ *
+ * - VM_UFFD_MINOR VMAs, because we explicitly want minor faults to occur even
+ *   when the other half of a shared mapping is populated (even though the page
+ *   is there, in our use case it may be out of date, so userspace needs to
+ *   check for this and possibly update it).
  */
 static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
 {
-	return vma->vm_flags & VM_UFFD_WP;
+	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
 }
 
 static inline bool userfaultfd_missing(struct vm_area_struct *vma)
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (5 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-25  7:53   ` kernel test robot
                     ` (2 more replies)
  2021-01-22 21:29 ` [PATCH v2 8/9] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 9/9] userfaultfd/selftests: add test exercising " Axel Rasmussen
  8 siblings, 3 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/userfaultfd.c                 | 67 +++++++++++++++++++++++++++++++
 include/linux/hugetlb.h          |  1 +
 include/linux/userfaultfd_k.h    | 18 +++++++++
 include/uapi/linux/userfaultfd.h | 21 +++++++++-
 mm/hugetlb.c                     | 24 ++++++-----
 mm/userfaultfd.c                 | 69 +++++++++++++++++++++-----------
 6 files changed, 166 insertions(+), 34 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 968aca3e3ee9..80a3fca389b8 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1530,6 +1530,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP))
 			ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT);
 
+		/* CONTINUE ioctl is only supported for MINOR ranges. */
+		if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR))
+			ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE);
+
 		/*
 		 * Now that we scanned all vmas we can already tell
 		 * userland which ioctls methods are guaranteed to
@@ -1883,6 +1887,66 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
 	return ret;
 }
 
+static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
+{
+	__s64 ret;
+	struct uffdio_continue uffdio_continue;
+	struct uffdio_continue __user *user_uffdio_continue;
+	struct userfaultfd_wake_range range;
+
+	user_uffdio_continue = (struct uffdio_continue __user *)arg;
+
+	ret = -EAGAIN;
+	if (READ_ONCE(ctx->mmap_changing))
+		goto out;
+
+	ret = -EFAULT;
+	if (copy_from_user(&uffdio_continue, user_uffdio_continue,
+			   /* don't copy the output fields */
+			   sizeof(uffdio_continue) - (sizeof(__s64))))
+		goto out;
+
+	ret = validate_range(ctx->mm, &uffdio_continue.range.start,
+			     uffdio_continue.range.len);
+	if (ret)
+		goto out;
+
+	ret = -EINVAL;
+	/* double check for wraparound just in case. */
+	if (uffdio_continue.range.start + uffdio_continue.range.len <=
+	    uffdio_continue.range.start) {
+		goto out;
+	}
+	if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
+		goto out;
+
+	if (mmget_not_zero(ctx->mm)) {
+		ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
+				     uffdio_continue.range.len,
+				     &ctx->mmap_changing);
+		mmput(ctx->mm);
+	} else {
+		return -ESRCH;
+	}
+
+	if (unlikely(put_user(ret, &user_uffdio_continue->mapped)))
+		return -EFAULT;
+	if (ret < 0)
+		goto out;
+
+	/* len == 0 would wake all */
+	BUG_ON(!ret);
+	range.len = ret;
+	if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) {
+		range.start = uffdio_continue.range.start;
+		wake_userfault(ctx, &range);
+	}
+	ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN;
+
+out:
+	return ret;
+}
+
 static inline unsigned int uffd_ctx_features(__u64 user_features)
 {
 	/*
@@ -1967,6 +2031,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd,
 	case UFFDIO_WRITEPROTECT:
 		ret = userfaultfd_writeprotect(ctx, arg);
 		break;
+	case UFFDIO_CONTINUE:
+		ret = userfaultfd_continue(ctx, arg);
+		break;
 	}
 	return ret;
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f94a35296618..f9fd7df1d586 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -139,6 +139,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				struct vm_area_struct *dst_vma,
 				unsigned long dst_addr,
 				unsigned long src_addr,
+				enum mcopy_atomic_mode mode,
 				struct page **pagep);
 int hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index fb9abaeb4194..2fcb686211e8 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -37,6 +37,22 @@ extern int sysctl_unprivileged_userfaultfd;
 
 extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
 
+/*
+ * The mode of operation for __mcopy_atomic and its helpers.
+ *
+ * This is almost an implementation detail (mcopy_atomic below doesn't take this
+ * as a parameter), but it's exposed here because memory-kind-specific
+ * implementations (e.g. hugetlbfs) need to know the mode of operation.
+ */
+enum mcopy_atomic_mode {
+	/* A normal copy_from_user into the destination range. */
+	MCOPY_ATOMIC_NORMAL,
+	/* Don't copy; map the destination range to the zero page. */
+	MCOPY_ATOMIC_ZEROPAGE,
+	/* Just setup the dst_vma, without modifying the underlying page(s). */
+	MCOPY_ATOMIC_CONTINUE,
+};
+
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
 			    unsigned long src_start, unsigned long len,
 			    bool *mmap_changing, __u64 mode);
@@ -44,6 +60,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
 			      unsigned long dst_start,
 			      unsigned long len,
 			      bool *mmap_changing);
+extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start,
+			      unsigned long len, bool *mmap_changing);
 extern int mwriteprotect_range(struct mm_struct *dst_mm,
 			       unsigned long start, unsigned long len,
 			       bool enable_wp, bool *mmap_changing);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index f24dd4fcbad9..bafbeb1a2624 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -40,10 +40,12 @@
 	((__u64)1 << _UFFDIO_WAKE |		\
 	 (__u64)1 << _UFFDIO_COPY |		\
 	 (__u64)1 << _UFFDIO_ZEROPAGE |		\
-	 (__u64)1 << _UFFDIO_WRITEPROTECT)
+	 (__u64)1 << _UFFDIO_WRITEPROTECT |	\
+	 (__u64)1 << _UFFDIO_CONTINUE)
 #define UFFD_API_RANGE_IOCTLS_BASIC		\
 	((__u64)1 << _UFFDIO_WAKE |		\
-	 (__u64)1 << _UFFDIO_COPY)
+	 (__u64)1 << _UFFDIO_COPY |		\
+	 (__u64)1 << _UFFDIO_CONTINUE)
 
 /*
  * Valid ioctl command number range with this API is from 0x00 to
@@ -59,6 +61,7 @@
 #define _UFFDIO_COPY			(0x03)
 #define _UFFDIO_ZEROPAGE		(0x04)
 #define _UFFDIO_WRITEPROTECT		(0x06)
+#define _UFFDIO_CONTINUE		(0x07)
 #define _UFFDIO_API			(0x3F)
 
 /* userfaultfd ioctl ids */
@@ -77,6 +80,8 @@
 				      struct uffdio_zeropage)
 #define UFFDIO_WRITEPROTECT	_IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \
 				      struct uffdio_writeprotect)
+#define UFFDIO_CONTINUE		_IOR(UFFDIO, _UFFDIO_CONTINUE,	\
+				     struct uffdio_continue)
 
 /* read() structure */
 struct uffd_msg {
@@ -268,6 +273,18 @@ struct uffdio_writeprotect {
 	__u64 mode;
 };
 
+struct uffdio_continue {
+	struct uffdio_range range;
+#define UFFDIO_CONTINUE_MODE_DONTWAKE		((__u64)1<<0)
+	__u64 mode;
+
+	/*
+	 * Fields below here are written by the ioctl and must be at the end:
+	 * the copy_from_user will not read past here.
+	 */
+	__s64 mapped;
+};
+
 /*
  * Flags for the userfaultfd(2) system call itself.
  */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f9d8349f818..740a090f34d1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4656,6 +4656,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    struct vm_area_struct *dst_vma,
 			    unsigned long dst_addr,
 			    unsigned long src_addr,
+			    enum mcopy_atomic_mode mode,
 			    struct page **pagep)
 {
 	struct address_space *mapping;
@@ -4668,7 +4669,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	int ret;
 	struct page *page;
 
-	if (!*pagep) {
+	mapping = dst_vma->vm_file->f_mapping;
+	idx = vma_hugecache_offset(h, dst_vma, dst_addr);
+
+	if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
 		ret = -ENOMEM;
 		page = alloc_huge_page(dst_vma, dst_addr, 0);
 		if (IS_ERR(page))
@@ -4685,6 +4689,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			/* don't free the page */
 			goto out;
 		}
+	} else if (mode == MCOPY_ATOMIC_CONTINUE) {
+		ret = -EFAULT;
+		page = find_lock_page(mapping, idx);
+		*pagep = NULL;
+		if (!page)
+			goto out;
 	} else {
 		page = *pagep;
 		*pagep = NULL;
@@ -4697,13 +4707,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	 */
 	__SetPageUptodate(page);
 
-	mapping = dst_vma->vm_file->f_mapping;
-	idx = vma_hugecache_offset(h, dst_vma, dst_addr);
-
-	/*
-	 * If shared, add to page cache
-	 */
-	if (vm_shared) {
+	/* Add shared, newly allocated pages to the page cache. */
+	if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
 		size = i_size_read(mapping->host) >> huge_page_shift(h);
 		ret = -EFAULT;
 		if (idx >= size)
@@ -4763,7 +4768,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	update_mmu_cache(dst_vma, dst_addr, dst_pte);
 
 	spin_unlock(ptl);
-	set_page_huge_active(page);
+	if (mode != MCOPY_ATOMIC_CONTINUE)
+		set_page_huge_active(page);
 	if (vm_shared)
 		unlock_page(page);
 	ret = 0;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index b2ce61c1b50d..a762b9cefaea 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -207,7 +207,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 					      unsigned long dst_start,
 					      unsigned long src_start,
 					      unsigned long len,
-					      bool zeropage)
+					      enum mcopy_atomic_mode mode)
 {
 	int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED;
 	int vm_shared = dst_vma->vm_flags & VM_SHARED;
@@ -227,7 +227,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	 * by THP.  Since we can not reliably insert a zero page, this
 	 * feature is not supported.
 	 */
-	if (zeropage) {
+	if (mode == MCOPY_ATOMIC_ZEROPAGE) {
 		mmap_read_unlock(dst_mm);
 		return -EINVAL;
 	}
@@ -273,8 +273,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 	}
 
 	while (src_addr < src_start + len) {
-		pte_t dst_pteval;
-
 		BUG_ON(dst_addr >= dst_start + len);
 
 		/*
@@ -297,16 +295,17 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 			goto out_unlock;
 		}
 
-		err = -EEXIST;
-		dst_pteval = huge_ptep_get(dst_pte);
-		if (!huge_pte_none(dst_pteval)) {
-			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
-			i_mmap_unlock_read(mapping);
-			goto out_unlock;
+		if (mode != MCOPY_ATOMIC_CONTINUE) {
+			if (!huge_pte_none(huge_ptep_get(dst_pte))) {
+				err = -EEXIST;
+				mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+				i_mmap_unlock_read(mapping);
+				goto out_unlock;
+			}
 		}
 
 		err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma,
-						dst_addr, src_addr, &page);
+					       dst_addr, src_addr, mode, &page);
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 		i_mmap_unlock_read(mapping);
@@ -408,7 +407,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm,
 				      unsigned long dst_start,
 				      unsigned long src_start,
 				      unsigned long len,
-				      bool zeropage);
+				      enum mcopy_atomic_mode mode);
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
@@ -417,7 +416,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 						unsigned long dst_addr,
 						unsigned long src_addr,
 						struct page **page,
-						bool zeropage,
+						enum mcopy_atomic_mode mode,
 						bool wp_copy)
 {
 	ssize_t err;
@@ -433,22 +432,38 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 	 * and not in the radix tree.
 	 */
 	if (!(dst_vma->vm_flags & VM_SHARED)) {
-		if (!zeropage)
+		switch (mode) {
+		case MCOPY_ATOMIC_NORMAL:
 			err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
 					       dst_addr, src_addr, page,
 					       wp_copy);
-		else
+			break;
+		case MCOPY_ATOMIC_ZEROPAGE:
 			err = mfill_zeropage_pte(dst_mm, dst_pmd,
 						 dst_vma, dst_addr);
+			break;
+		/* It only makes sense to CONTINUE for shared memory. */
+		case MCOPY_ATOMIC_CONTINUE:
+			err = -EINVAL;
+			break;
+		}
 	} else {
 		VM_WARN_ON_ONCE(wp_copy);
-		if (!zeropage)
+		switch (mode) {
+		case MCOPY_ATOMIC_NORMAL:
 			err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd,
 						     dst_vma, dst_addr,
 						     src_addr, page);
-		else
+			break;
+		case MCOPY_ATOMIC_ZEROPAGE:
 			err = shmem_mfill_zeropage_pte(dst_mm, dst_pmd,
 						       dst_vma, dst_addr);
+			break;
+		case MCOPY_ATOMIC_CONTINUE:
+			/* FIXME: Add minor fault interception for shmem. */
+			err = -EINVAL;
+			break;
+		}
 	}
 
 	return err;
@@ -458,7 +473,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 					      unsigned long dst_start,
 					      unsigned long src_start,
 					      unsigned long len,
-					      bool zeropage,
+					      enum mcopy_atomic_mode mcopy_mode,
 					      bool *mmap_changing,
 					      __u64 mode)
 {
@@ -527,7 +542,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	 */
 	if (is_vm_hugetlb_page(dst_vma))
 		return  __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start,
-						src_start, len, zeropage);
+						src_start, len, mcopy_mode);
 
 	if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
 		goto out_unlock;
@@ -577,7 +592,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 		BUG_ON(pmd_trans_huge(*dst_pmd));
 
 		err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
-				       src_addr, &page, zeropage, wp_copy);
+				       src_addr, &page, mcopy_mode, wp_copy);
 		cond_resched();
 
 		if (unlikely(err == -ENOENT)) {
@@ -626,14 +641,22 @@ ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
 		     unsigned long src_start, unsigned long len,
 		     bool *mmap_changing, __u64 mode)
 {
-	return __mcopy_atomic(dst_mm, dst_start, src_start, len, false,
-			      mmap_changing, mode);
+	return __mcopy_atomic(dst_mm, dst_start, src_start, len,
+			      MCOPY_ATOMIC_NORMAL, mmap_changing, mode);
 }
 
 ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
 		       unsigned long len, bool *mmap_changing)
 {
-	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
+	return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE,
+			      mmap_changing, 0);
+}
+
+ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start,
+		       unsigned long len, bool *mmap_changing)
+{
+	return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE,
+			      mmap_changing, 0);
 }
 
 int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 8/9] userfaultfd: update documentation to describe minor fault handling
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (6 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  2021-01-22 21:29 ` [PATCH v2 9/9] userfaultfd/selftests: add test exercising " Axel Rasmussen
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used
to intercept and resolve minor faults. Make it clear that COPY and
ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR
faults.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 Documentation/admin-guide/mm/userfaultfd.rst | 105 +++++++++++--------
 1 file changed, 64 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 65eefa66c0ba..10c69458c794 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -63,36 +63,36 @@ the generic ioctl available.
 
 The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
 defines what memory types are supported by the ``userfaultfd`` and what
-events, except page fault notifications, may be generated.
-
-If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
-virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
-``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
-set if the kernel supports registering ``userfaultfd`` ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
-``MAP_SHARED``, ``memfd_create``, etc).
-
-The userland application that wants to use ``userfaultfd`` with hugetlbfs
-or shared memory need to set the corresponding flag in
-``uffdio_api.features`` to enable those features.
-
-If the userland desires to receive notifications for events other than
-page faults, it has to verify that ``uffdio_api.features`` has appropriate
-``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
-detail below in `Non-cooperative userfaultfd`_ section.
-
-Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
-be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
-register a memory range in the ``userfaultfd`` by setting the
+events, except page fault notifications, may be generated:
+
+- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
+  other than page faults are supported. These events are described in more
+  detail below in the `Non-cooperative userfaultfd`_ section.
+
+- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
+  indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
+  registrations for hugetlbfs and shared memory (covering all shmem APIs,
+  i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
+  etc) virtual memory areas, respectively.
+
+- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
+  ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
+  areas.
+
+The userland application should set the feature flags it intends to use
+when envoking the ``UFFDIO_API`` ioctl, to request that those features be
+enabled if supported.
+
+Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
+ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
+bitmask) to register a memory range in the ``userfaultfd`` by setting the
 uffdio_register structure accordingly. The ``uffdio_register.mode``
 bitmask will specify to the kernel which kind of faults to track for
-the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
-pages). The ``UFFDIO_REGISTER`` ioctl will return the
+the range. The ``UFFDIO_REGISTER`` ioctl will return the
 ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
 userfaults on the range registered. Not all ioctls will necessarily be
-supported for all memory types depending on the underlying virtual
-memory backend (anonymous memory vs tmpfs vs real filebacked
-mappings).
+supported for all memory types (e.g. anonymous memory vs. shmem vs.
+hugetlbfs), or all types of intercepted faults.
 
 Userland can use the ``uffdio_register.ioctls`` to manage the virtual
 address space in the background (to add or potentially also remove
@@ -100,21 +100,44 @@ memory from the ``userfaultfd`` registered range). This means a userfault
 could be triggering just before userland maps in the background the
 user-faulted page.
 
-The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
-atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults
-(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
-Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
-guaranteeing that nothing can see an half copied page since it'll
-keep userfaulting until the copy has finished.
+Resolving Userfaults
+--------------------
+
+There are three basic ways to resolve userfaults:
+
+- ``UFFDIO_COPY`` atomically copies some existing page contents from
+  userspace.
+
+- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
+
+- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
+
+These operations are atomic in the sense that they guarantee nothing can
+see a half-populated page, since readers will keep userfaulting until the
+operation has finished.
+
+By default, these wake up userfaults blocked on the range in question.
+They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
+that waking will be done separately at some later time.
+
+Which of these are used depends on the kind of fault:
+
+- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, a new page has to be
+  provided. This can be done with either ``UFFDIO_COPY`` or
+  ``UFFDIO_ZEROPAGE``. The default (non-userfaultfd) behavior would be to
+  provide a zero page, but in userfaultfd this is left up to userspace.
+
+- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, an existing page already
+  exists. Userspace needs to ensure its contents are correct (if it needs
+  to be modified, by writing directly to the non-userfaultfd-registered
+  side of shared memory), and then issue ``UFFDIO_CONTINUE`` to resolve
+  the fault.
 
 Notes:
 
-- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
-  you must provide some kind of page in your thread after reading from
-  the uffd.  You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
-  The normal behavior of the OS automatically providing a zero page on
-  an anonymous mmaping is not in place.
+- You can tell which kind of fault occurred by examining
+  ``pagefault.flags`` within the ``uffd_msg``, checking for the
+  ``UFFD_PAGEFAULT_FLAG_*`` flags.
 
 - None of the page-delivering ioctls default to the range that you
   registered with.  You must fill in all fields for the appropriate
@@ -122,9 +145,9 @@ Notes:
 
 - You get the address of the access that triggered the missing page
   event out of a struct uffd_msg that you read in the thread from the
-  uffd.  You can supply as many pages as you want with ``UFFDIO_COPY`` or
-  ``UFFDIO_ZEROPAGE``.  Keep in mind that unless you used DONTWAKE then
-  the first of any of those IOCTLs wakes up the faulting thread.
+  uffd.  You can supply as many pages as you want with these IOCTLs.
+  Keep in mind that unless you used DONTWAKE then the first of any of
+  those IOCTLs wakes up the faulting thread.
 
 - Be sure to test for all errors including
   (``pollfd[0].revents & POLLERR``).  This can happen, e.g. when ranges
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 9/9] userfaultfd/selftests: add test exercising minor fault handling
  2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
                   ` (7 preceding siblings ...)
  2021-01-22 21:29 ` [PATCH v2 8/9] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
@ 2021-01-22 21:29 ` Axel Rasmussen
  8 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-22 21:29 UTC (permalink / raw)
  To: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, Jann Horn, Jerome Glisse, Lokesh Gidra,
	Matthew Wilcox (Oracle), Michael Ellerman, Michal Koutný,
	Michel Lespinasse, Mike Kravetz, Mike Rapoport, Nicholas Piggin,
	Peter Xu, Shaohua Li, Shawn Anastasio, Steven Rostedt,
	Steven Price, Vlastimil Babka
  Cc: linux-kernel, linux-fsdevel, linux-mm, Adam Ruprecht,
	Axel Rasmussen, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Oliver Upton

Fix a dormant bug in userfaultfd_events_test(), where we did
`return faulting_process(0)` instead of `exit(faulting_process(0))`.
This caused the forked process to keep running, trying to execute any
further test cases after the events test in parallel with the "real"
process.

Add a simple test case which exercises minor faults. In short, it does
the following:

1. "Sets up" an area (area_dst) and a second shared mapping to the same
   underlying pages (area_dst_alias).

2. Register one of these areas with userfaultfd, in minor fault mode.

3. Start a second thread to handle any minor faults.

4. Populate the underlying pages with the non-UFFD-registered side of
   the mapping. Basically, memset() each page with some arbitrary
   contents.

5. Then, using the UFFD-registered mapping, read all of the page
   contents, asserting that the contents match expectations (we expect
   the minor fault handling thread can modify the page contents before
   resolving the fault).

The minor fault handling thread, upon receiving an event, flips all the
bits (~) in that page, just to prove that it can modify it in some
arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the
mapping and resolve the fault. The reading thread should wake up and see
this modification.

Currently the minor fault test is only enabled in hugetlb_shared mode,
as this is the only configuration the kernel feature supports.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 147 ++++++++++++++++++++++-
 1 file changed, 143 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 92b8ec423201..73a72a3c4189 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -81,6 +81,8 @@ static volatile bool test_uffdio_copy_eexist = true;
 static volatile bool test_uffdio_zeropage_eexist = true;
 /* Whether to test uffd write-protection */
 static bool test_uffdio_wp = false;
+/* Whether to test uffd minor faults */
+static bool test_uffdio_minor = false;
 
 static bool map_shared;
 static int huge_fd;
@@ -96,6 +98,7 @@ struct uffd_stats {
 	int cpu;
 	unsigned long missing_faults;
 	unsigned long wp_faults;
+	unsigned long minor_faults;
 };
 
 /* pthread_mutex_t starts at page offset 0 */
@@ -153,17 +156,19 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats,
 		uffd_stats[i].cpu = i;
 		uffd_stats[i].missing_faults = 0;
 		uffd_stats[i].wp_faults = 0;
+		uffd_stats[i].minor_faults = 0;
 	}
 }
 
 static void uffd_stats_report(struct uffd_stats *stats, int n_cpus)
 {
 	int i;
-	unsigned long long miss_total = 0, wp_total = 0;
+	unsigned long long miss_total = 0, wp_total = 0, minor_total = 0;
 
 	for (i = 0; i < n_cpus; i++) {
 		miss_total += stats[i].missing_faults;
 		wp_total += stats[i].wp_faults;
+		minor_total += stats[i].minor_faults;
 	}
 
 	printf("userfaults: %llu missing (", miss_total);
@@ -172,6 +177,9 @@ static void uffd_stats_report(struct uffd_stats *stats, int n_cpus)
 	printf("\b), %llu wp (", wp_total);
 	for (i = 0; i < n_cpus; i++)
 		printf("%lu+", stats[i].wp_faults);
+	printf("\b), %llu minor (", minor_total);
+	for (i = 0; i < n_cpus; i++)
+		printf("%lu+", stats[i].minor_faults);
 	printf("\b)\n");
 }
 
@@ -328,7 +336,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-	.expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC,
+	.expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE),
 	.allocate_area	= hugetlb_allocate_area,
 	.release_pages	= hugetlb_release_pages,
 	.alias_mapping = hugetlb_alias_mapping,
@@ -362,6 +370,22 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
 	}
 }
 
+static void continue_range(int ufd, __u64 start, __u64 len)
+{
+	struct uffdio_continue req;
+
+	req.range.start = start;
+	req.range.len = len;
+	req.mode = 0;
+
+	if (ioctl(ufd, UFFDIO_CONTINUE, &req)) {
+		fprintf(stderr,
+			"UFFDIO_CONTINUE failed for address 0x%" PRIx64 "\n",
+			(uint64_t)start);
+		exit(1);
+	}
+}
+
 static void *locking_thread(void *arg)
 {
 	unsigned long cpu = (unsigned long) arg;
@@ -569,8 +593,32 @@ static void uffd_handle_page_fault(struct uffd_msg *msg,
 	}
 
 	if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) {
+		/* Write protect page faults */
 		wp_range(uffd, msg->arg.pagefault.address, page_size, false);
 		stats->wp_faults++;
+	} else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) {
+		uint8_t *area;
+		int b;
+
+		/*
+		 * Minor page faults
+		 *
+		 * To prove we can modify the original range for testing
+		 * purposes, we're going to bit flip this range before
+		 * continuing.
+		 *
+		 * Note that this requires all minor page fault tests operate on
+		 * area_dst (non-UFFD-registered) and area_dst_alias
+		 * (UFFD-registered).
+		 */
+
+		area = (uint8_t *)(area_dst +
+				   ((char *)msg->arg.pagefault.address -
+				    area_dst_alias));
+		for (b = 0; b < page_size; ++b)
+			area[b] = ~area[b];
+		continue_range(uffd, msg->arg.pagefault.address, page_size);
+		stats->minor_faults++;
 	} else {
 		/* Missing page faults */
 		if (bounces & BOUNCE_VERIFY &&
@@ -1112,7 +1160,7 @@ static int userfaultfd_events_test(void)
 	}
 
 	if (!pid)
-		return faulting_process(0);
+		exit(faulting_process(0));
 
 	waitpid(pid, &err, 0);
 	if (err) {
@@ -1215,6 +1263,95 @@ static int userfaultfd_sig_test(void)
 	return userfaults != 0;
 }
 
+static int userfaultfd_minor_test(void)
+{
+	struct uffdio_register uffdio_register;
+	unsigned long expected_ioctls;
+	unsigned long p;
+	pthread_t uffd_mon;
+	uint8_t expected_byte;
+	void *expected_page;
+	char c;
+	struct uffd_stats stats = { 0 };
+
+	if (!test_uffdio_minor)
+		return 0;
+
+	printf("testing minor faults: ");
+	fflush(stdout);
+
+	if (uffd_test_ops->release_pages(area_dst))
+		return 1;
+
+	if (userfaultfd_open(0))
+		return 1;
+
+	uffdio_register.range.start = (unsigned long)area_dst_alias;
+	uffdio_register.range.len = nr_pages * page_size;
+	uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR;
+	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
+		fprintf(stderr, "register failure\n");
+		exit(1);
+	}
+
+	expected_ioctls = uffd_test_ops->expected_ioctls;
+	expected_ioctls |= 1 << _UFFDIO_CONTINUE;
+	if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) {
+		fprintf(stderr, "unexpected missing ioctl(s)\n");
+		exit(1);
+	}
+
+	/*
+	 * After registering with UFFD, populate the non-UFFD-registered side of
+	 * the shared mapping. This should *not* trigger any UFFD minor faults.
+	 */
+	for (p = 0; p < nr_pages; ++p) {
+		memset(area_dst + (p * page_size), p % ((uint8_t)-1),
+		       page_size);
+	}
+
+	if (pthread_create(&uffd_mon, &attr, uffd_poll_thread, &stats)) {
+		perror("uffd_poll_thread create");
+		exit(1);
+	}
+
+	/*
+	 * Read each of the pages back using the UFFD-registered mapping. We
+	 * expect that the first time we touch a page, it will result in a minor
+	 * fault. uffd_poll_thread will resolve the fault by bit-flipping the
+	 * page's contents, and then issuing a CONTINUE ioctl.
+	 */
+
+	if (posix_memalign(&expected_page, page_size, page_size)) {
+		fprintf(stderr, "out of memory\n");
+		return 1;
+	}
+
+	for (p = 0; p < nr_pages; ++p) {
+		expected_byte = ~((uint8_t)(p % ((uint8_t)-1)));
+		memset(expected_page, expected_byte, page_size);
+		if (my_bcmp(expected_page, area_dst_alias + (p * page_size),
+			    page_size)) {
+			fprintf(stderr,
+				"unexpected page contents after minor fault\n");
+			exit(1);
+		}
+	}
+
+	if (write(pipefd[1], &c, sizeof(c)) != sizeof(c)) {
+		perror("pipe write");
+		exit(1);
+	}
+	if (pthread_join(uffd_mon, NULL))
+		return 1;
+
+	close(uffd);
+
+	uffd_stats_report(&stats, 1);
+
+	return stats.minor_faults != nr_pages;
+}
+
 static int userfaultfd_stress(void)
 {
 	void *area;
@@ -1413,7 +1550,7 @@ static int userfaultfd_stress(void)
 
 	close(uffd);
 	return userfaultfd_zeropage_test() || userfaultfd_sig_test()
-		|| userfaultfd_events_test();
+		|| userfaultfd_events_test() || userfaultfd_minor_test();
 }
 
 /*
@@ -1454,6 +1591,8 @@ static void set_test_type(const char *type)
 		map_shared = true;
 		test_type = TEST_HUGETLB;
 		uffd_test_ops = &hugetlb_uffd_test_ops;
+		/* Minor faults require shared hugetlb; only enable here. */
+		test_uffdio_minor = true;
 	} else if (!strcmp(type, "shmem")) {
 		map_shared = true;
 		test_type = TEST_SHMEM;
-- 
2.30.0.280.ga3ce27912f-goog



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl
  2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
@ 2021-01-25  7:53   ` kernel test robot
  2021-01-25  8:31   ` kernel test robot
  2021-01-25 13:37   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-01-25  7:53 UTC (permalink / raw)
  To: Axel Rasmussen, Alexander Viro, Alexey Dobriyan,
	Andrea Arcangeli, Andrew Morton, Anshuman Khandual,
	Catalin Marinas, Chinwen Chang, Huang Ying, Ingo Molnar
  Cc: kbuild-all, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 14509 bytes --]

Hi Axel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on powerpc/next s390/features tip/perf/core linus/master v5.11-rc5 next-20210122]
[cannot apply to hp-parisc/for-next hnaz-linux-mm/master ia64/next sparc-next/master sparc/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: powerpc-randconfig-r006-20210125 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b8fb53c3a341b9b853aa3286286c807088311dbd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
        git checkout b8fb53c3a341b9b853aa3286286c807088311dbd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from fs/proc/task_mmu.c:4:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
--
   In file included from fs/proc/meminfo.c:6:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   fs/proc/meminfo.c:22:28: warning: no previous prototype for 'arch_report_meminfo' [-Wmissing-prototypes]
      22 | void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
         |                            ^~~~~~~~~~~~~~~~~~~
--
   In file included from kernel/events/core.c:31:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   kernel/events/core.c:6535:6: warning: no previous prototype for 'perf_pmu_snapshot_aux' [-Wmissing-prototypes]
    6535 | long perf_pmu_snapshot_aux(struct perf_buffer *rb,
         |      ^~~~~~~~~~~~~~~~~~~~~
--
   In file included from kernel/fork.c:51:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   kernel/fork.c:161:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes]
     161 | void __weak arch_release_task_struct(struct task_struct *tsk)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:746:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes]
     746 | void __init __weak arch_task_cache_init(void) { }
         |                    ^~~~~~~~~~~~~~~~~~~~
--
   In file included from arch/powerpc/mm/pgtable.c:25:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   arch/powerpc/mm/pgtable.c:337:8: warning: no previous prototype for '__find_linux_pte' [-Wmissing-prototypes]
     337 | pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
         |        ^~~~~~~~~~~~~~~~
--
   In file included from include/linux/migrate.h:8,
                    from mm/page_alloc.c:61:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   mm/page_alloc.c:3597:15: warning: no previous prototype for 'should_fail_alloc_page' [-Wmissing-prototypes]
    3597 | noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
         |               ^~~~~~~~~~~~~~~~~~~~~~
   mm/page_alloc.c:6258:23: warning: no previous prototype for 'memmap_init' [-Wmissing-prototypes]
    6258 | void __meminit __weak memmap_init(unsigned long size, int nid,
         |                       ^~~~~~~~~~~
--
   In file included from mm/hugetlb.c:39:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
>> mm/hugetlb.c:4659:13: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
    4659 |        enum mcopy_atomic_mode mode,
         |             ^~~~~~~~~~~~~~~~~
>> mm/hugetlb.c:4659:31: error: parameter 6 ('mode') has incomplete type
    4659 |        enum mcopy_atomic_mode mode,
         |        ~~~~~~~~~~~~~~~~~~~~~~~^~~~
   mm/hugetlb.c: In function 'hugetlb_mcopy_atomic_pte':
>> mm/hugetlb.c:4675:25: error: 'MCOPY_ATOMIC_CONTINUE' undeclared (first use in this function)
    4675 |  if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
         |                         ^~~~~~~~~~~~~~~~~~~~~
   mm/hugetlb.c:4675:25: note: each undeclared identifier is reported only once for each function it appears in
--
   In file included from mm/util.c:16:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   mm/util.c: In function 'page_mapping':
   mm/util.c:700:15: warning: variable 'entry' set but not used [-Wunused-but-set-variable]
     700 |   swp_entry_t entry;
         |               ^~~~~
--
   In file included from include/linux/migrate.h:8,
                    from mm/compaction.c:13:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   mm/compaction.c:56:27: warning: 'HPAGE_FRAG_CHECK_INTERVAL_MSEC' defined but not used [-Wunused-const-variable=]
      56 | static const unsigned int HPAGE_FRAG_CHECK_INTERVAL_MSEC = 500;
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/migrate.h:8,
                    from kernel/sched/sched.h:53,
                    from kernel/sched/fair.c:23:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:5388:6: warning: no previous prototype for 'init_cfs_bandwidth' [-Wmissing-prototypes]
    5388 | void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
         |      ^~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11195:6: warning: no previous prototype for 'free_fair_sched_group' [-Wmissing-prototypes]
   11195 | void free_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11197:5: warning: no previous prototype for 'alloc_fair_sched_group' [-Wmissing-prototypes]
   11197 | int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
         |     ^~~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11202:6: warning: no previous prototype for 'online_fair_sched_group' [-Wmissing-prototypes]
   11202 | void online_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11204:6: warning: no previous prototype for 'unregister_fair_sched_group' [-Wmissing-prototypes]
   11204 | void unregister_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/migrate.h:8,
                    from kernel/sched/sched.h:53,
                    from kernel/sched/rt.c:6:
>> include/linux/hugetlb.h:142:10: warning: 'enum mcopy_atomic_mode' declared inside parameter list will not be visible outside of this definition or declaration
     142 |     enum mcopy_atomic_mode mode,
         |          ^~~~~~~~~~~~~~~~~
   kernel/sched/rt.c:253:6: warning: no previous prototype for 'free_rt_sched_group' [-Wmissing-prototypes]
     253 | void free_rt_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~
   kernel/sched/rt.c:255:5: warning: no previous prototype for 'alloc_rt_sched_group' [-Wmissing-prototypes]
     255 | int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
         |     ^~~~~~~~~~~~~~~~~~~~
   kernel/sched/rt.c:669:6: warning: no previous prototype for 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
     669 | bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
..


vim +4659 mm/hugetlb.c

  4649	
  4650	/*
  4651	 * Used by userfaultfd UFFDIO_COPY.  Based on mcopy_atomic_pte with
  4652	 * modifications for huge pages.
  4653	 */
  4654	int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
  4655				    pte_t *dst_pte,
  4656				    struct vm_area_struct *dst_vma,
  4657				    unsigned long dst_addr,
  4658				    unsigned long src_addr,
> 4659				    enum mcopy_atomic_mode mode,
  4660				    struct page **pagep)
  4661	{
  4662		struct address_space *mapping;
  4663		pgoff_t idx;
  4664		unsigned long size;
  4665		int vm_shared = dst_vma->vm_flags & VM_SHARED;
  4666		struct hstate *h = hstate_vma(dst_vma);
  4667		pte_t _dst_pte;
  4668		spinlock_t *ptl;
  4669		int ret;
  4670		struct page *page;
  4671	
  4672		mapping = dst_vma->vm_file->f_mapping;
  4673		idx = vma_hugecache_offset(h, dst_vma, dst_addr);
  4674	
> 4675		if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
  4676			ret = -ENOMEM;
  4677			page = alloc_huge_page(dst_vma, dst_addr, 0);
  4678			if (IS_ERR(page))
  4679				goto out;
  4680	
  4681			ret = copy_huge_page_from_user(page,
  4682							(const void __user *) src_addr,
  4683							pages_per_huge_page(h), false);
  4684	
  4685			/* fallback to copy_from_user outside mmap_lock */
  4686			if (unlikely(ret)) {
  4687				ret = -ENOENT;
  4688				*pagep = page;
  4689				/* don't free the page */
  4690				goto out;
  4691			}
  4692		} else if (mode == MCOPY_ATOMIC_CONTINUE) {
  4693			ret = -EFAULT;
  4694			page = find_lock_page(mapping, idx);
  4695			*pagep = NULL;
  4696			if (!page)
  4697				goto out;
  4698		} else {
  4699			page = *pagep;
  4700			*pagep = NULL;
  4701		}
  4702	
  4703		/*
  4704		 * The memory barrier inside __SetPageUptodate makes sure that
  4705		 * preceding stores to the page contents become visible before
  4706		 * the set_pte_at() write.
  4707		 */
  4708		__SetPageUptodate(page);
  4709	
  4710		/* Add shared, newly allocated pages to the page cache. */
  4711		if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
  4712			size = i_size_read(mapping->host) >> huge_page_shift(h);
  4713			ret = -EFAULT;
  4714			if (idx >= size)
  4715				goto out_release_nounlock;
  4716	
  4717			/*
  4718			 * Serialization between remove_inode_hugepages() and
  4719			 * huge_add_to_page_cache() below happens through the
  4720			 * hugetlb_fault_mutex_table that here must be hold by
  4721			 * the caller.
  4722			 */
  4723			ret = huge_add_to_page_cache(page, mapping, idx);
  4724			if (ret)
  4725				goto out_release_nounlock;
  4726		}
  4727	
  4728		ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
  4729		spin_lock(ptl);
  4730	
  4731		/*
  4732		 * Recheck the i_size after holding PT lock to make sure not
  4733		 * to leave any page mapped (as page_mapped()) beyond the end
  4734		 * of the i_size (remove_inode_hugepages() is strict about
  4735		 * enforcing that). If we bail out here, we'll also leave a
  4736		 * page in the radix tree in the vm_shared case beyond the end
  4737		 * of the i_size, but remove_inode_hugepages() will take care
  4738		 * of it as soon as we drop the hugetlb_fault_mutex_table.
  4739		 */
  4740		size = i_size_read(mapping->host) >> huge_page_shift(h);
  4741		ret = -EFAULT;
  4742		if (idx >= size)
  4743			goto out_release_unlock;
  4744	
  4745		ret = -EEXIST;
  4746		if (!huge_pte_none(huge_ptep_get(dst_pte)))
  4747			goto out_release_unlock;
  4748	
  4749		if (vm_shared) {
  4750			page_dup_rmap(page, true);
  4751		} else {
  4752			ClearPagePrivate(page);
  4753			hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
  4754		}
  4755	
  4756		_dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE);
  4757		if (dst_vma->vm_flags & VM_WRITE)
  4758			_dst_pte = huge_pte_mkdirty(_dst_pte);
  4759		_dst_pte = pte_mkyoung(_dst_pte);
  4760	
  4761		set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
  4762	
  4763		(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
  4764						dst_vma->vm_flags & VM_WRITE);
  4765		hugetlb_count_add(pages_per_huge_page(h), dst_mm);
  4766	
  4767		/* No need to invalidate - it was non-present before */
  4768		update_mmu_cache(dst_vma, dst_addr, dst_pte);
  4769	
  4770		spin_unlock(ptl);
  4771		if (mode != MCOPY_ATOMIC_CONTINUE)
  4772			set_page_huge_active(page);
  4773		if (vm_shared)
  4774			unlock_page(page);
  4775		ret = 0;
  4776	out:
  4777		return ret;
  4778	out_release_unlock:
  4779		spin_unlock(ptl);
  4780		if (vm_shared)
  4781			unlock_page(page);
  4782	out_release_nounlock:
  4783		put_page(page);
  4784		goto out;
  4785	}
  4786	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30207 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl
  2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
  2021-01-25  7:53   ` kernel test robot
@ 2021-01-25  8:31   ` kernel test robot
  2021-01-25 13:37   ` kernel test robot
  2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-01-25  8:31 UTC (permalink / raw)
  To: Axel Rasmussen, Alexander Viro, Alexey Dobriyan,
	Andrea Arcangeli, Andrew Morton, Anshuman Khandual,
	Catalin Marinas, Chinwen Chang, Huang Ying, Ingo Molnar
  Cc: kbuild-all, clang-built-linux, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 17675 bytes --]

Hi Axel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on powerpc/next s390/features tip/perf/core linus/master v5.11-rc5 next-20210122]
[cannot apply to hp-parisc/for-next hnaz-linux-mm/master ia64/next sparc-next/master sparc/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: x86_64-randconfig-a013-20210125 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 12d0753aca22896fda2cf76781b0ee0524d55065)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/b8fb53c3a341b9b853aa3286286c807088311dbd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
        git checkout b8fb53c3a341b9b853aa3286286c807088311dbd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/sched/core.c:13:
   In file included from kernel/sched/sched.h:53:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   kernel/sched/core.c:2884:20: warning: unused function 'rq_has_pinned_tasks' [-Wunused-function]
   static inline bool rq_has_pinned_tasks(struct rq *rq)
                      ^
   kernel/sched/core.c:4687:20: warning: unused function 'sched_tick_start' [-Wunused-function]
   static inline void sched_tick_start(int cpu) { }
                      ^
   kernel/sched/core.c:4688:20: warning: unused function 'sched_tick_stop' [-Wunused-function]
   static inline void sched_tick_stop(int cpu) { }
                      ^
   4 warnings generated.
--
   In file included from kernel/sched/loadavg.c:9:
   In file included from kernel/sched/sched.h:53:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   1 warning generated.
--
   In file included from kernel/sched/fair.c:23:
   In file included from kernel/sched/sched.h:53:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   kernel/sched/fair.c:5388:6: warning: no previous prototype for function 'init_cfs_bandwidth' [-Wmissing-prototypes]
   void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
        ^
   kernel/sched/fair.c:5388:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
   ^
   static 
   kernel/sched/fair.c:11195:6: warning: no previous prototype for function 'free_fair_sched_group' [-Wmissing-prototypes]
   void free_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:11195:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void free_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:11197:5: warning: no previous prototype for function 'alloc_fair_sched_group' [-Wmissing-prototypes]
   int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
       ^
   kernel/sched/fair.c:11197:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
   ^
   static 
   kernel/sched/fair.c:11202:6: warning: no previous prototype for function 'online_fair_sched_group' [-Wmissing-prototypes]
   void online_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:11202:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void online_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:11204:6: warning: no previous prototype for function 'unregister_fair_sched_group' [-Wmissing-prototypes]
   void unregister_fair_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/fair.c:11204:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void unregister_fair_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/fair.c:486:20: warning: unused function 'list_del_leaf_cfs_rq' [-Wunused-function]
   static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
                      ^
   kernel/sched/fair.c:2985:20: warning: unused function 'account_numa_enqueue' [-Wunused-function]
   static inline void account_numa_enqueue(struct rq *rq, struct task_struct *p)
                      ^
   kernel/sched/fair.c:2989:20: warning: unused function 'account_numa_dequeue' [-Wunused-function]
   static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p)
                      ^
   kernel/sched/fair.c:2993:20: warning: unused function 'update_scan_period' [-Wunused-function]
   static inline void update_scan_period(struct task_struct *p, int new_cpu)
                      ^
   kernel/sched/fair.c:4083:20: warning: unused function 'remove_entity_load_avg' [-Wunused-function]
   static inline void remove_entity_load_avg(struct sched_entity *se) {}
                      ^
   kernel/sched/fair.c:5369:20: warning: unused function 'sync_throttle' [-Wunused-function]
   static inline void sync_throttle(struct task_group *tg, int cpu) {}
                      ^
   kernel/sched/fair.c:5382:19: warning: unused function 'throttled_lb_pair' [-Wunused-function]
   static inline int throttled_lb_pair(struct task_group *tg,
                     ^
   kernel/sched/fair.c:5394:37: warning: unused function 'tg_cfs_bandwidth' [-Wunused-function]
   static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
                                       ^
   kernel/sched/fair.c:5398:20: warning: unused function 'destroy_cfs_bandwidth' [-Wunused-function]
   static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
                      ^
   kernel/sched/fair.c:5399:20: warning: unused function 'update_runtime_enabled' [-Wunused-function]
   static inline void update_runtime_enabled(struct rq *rq) {}
                      ^
   kernel/sched/fair.c:5400:20: warning: unused function 'unthrottle_offline_cfs_rqs' [-Wunused-function]
   static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
                      ^
   17 warnings generated.
--
   In file included from kernel/sched/rt.c:6:
   In file included from kernel/sched/sched.h:53:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   kernel/sched/rt.c:253:6: warning: no previous prototype for function 'free_rt_sched_group' [-Wmissing-prototypes]
   void free_rt_sched_group(struct task_group *tg) { }
        ^
   kernel/sched/rt.c:253:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void free_rt_sched_group(struct task_group *tg) { }
   ^
   static 
   kernel/sched/rt.c:255:5: warning: no previous prototype for function 'alloc_rt_sched_group' [-Wmissing-prototypes]
   int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
       ^
   kernel/sched/rt.c:255:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
   ^
   static 
   kernel/sched/rt.c:669:6: warning: no previous prototype for function 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
   bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
        ^
   kernel/sched/rt.c:669:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
   ^
   static 
   kernel/sched/rt.c:421:20: warning: unused function 'need_pull_rt_task' [-Wunused-function]
   static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
                      ^
   kernel/sched/rt.c:426:20: warning: unused function 'pull_rt_task' [-Wunused-function]
   static inline void pull_rt_task(struct rq *this_rq)
                      ^
   kernel/sched/rt.c:476:20: warning: unused function 'rt_task_fits_capacity' [-Wunused-function]
   static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
                      ^
   kernel/sched/rt.c:1113:6: warning: unused function 'inc_rt_prio_smp' [-Wunused-function]
   void inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) {}
        ^
   kernel/sched/rt.c:1115:6: warning: unused function 'dec_rt_prio_smp' [-Wunused-function]
   void dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) {}
        ^
   9 warnings generated.
--
   In file included from kernel/sched/deadline.c:18:
   In file included from kernel/sched/sched.h:53:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   kernel/sched/deadline.c:700:20: warning: unused function 'need_pull_dl_task' [-Wunused-function]
   static inline bool need_pull_dl_task(struct rq *rq, struct task_struct *prev)
                      ^
   kernel/sched/deadline.c:705:20: warning: unused function 'pull_dl_task' [-Wunused-function]
   static inline void pull_dl_task(struct rq *rq)
                      ^
   3 warnings generated.
--
   In file included from mm/page_alloc.c:61:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   mm/page_alloc.c:3597:15: warning: no previous prototype for function 'should_fail_alloc_page' [-Wmissing-prototypes]
   noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
                 ^
   mm/page_alloc.c:3597:10: note: declare 'static' if the function is not intended to be used outside of this translation unit
   noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
            ^
            static 
   mm/page_alloc.c:6258:23: warning: no previous prototype for function 'memmap_init' [-Wmissing-prototypes]
   void __meminit __weak memmap_init(unsigned long size, int nid,
                         ^
   mm/page_alloc.c:6258:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __meminit __weak memmap_init(unsigned long size, int nid,
   ^
   static 
   3 warnings generated.
--
   In file included from mm/hugetlb.c:39:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
>> mm/hugetlb.c:4659:13: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                               enum mcopy_atomic_mode mode,
                                    ^
   mm/hugetlb.c:4654:5: error: conflicting types for 'hugetlb_mcopy_atomic_pte'
   int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
       ^
   include/linux/hugetlb.h:138:5: note: previous declaration is here
   int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
       ^
   mm/hugetlb.c:4659:31: error: variable has incomplete type 'enum mcopy_atomic_mode'
                               enum mcopy_atomic_mode mode,
                                                      ^
   mm/hugetlb.c:4659:13: note: forward declaration of 'enum mcopy_atomic_mode'
                               enum mcopy_atomic_mode mode,
                                    ^
   mm/hugetlb.c:4675:25: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
                                  ^
   mm/hugetlb.c:4692:21: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           } else if (mode == MCOPY_ATOMIC_CONTINUE) {
                              ^
   mm/hugetlb.c:4711:27: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
                                    ^
   mm/hugetlb.c:4771:14: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (mode != MCOPY_ATOMIC_CONTINUE)
                       ^
   2 warnings and 6 errors generated.
--
   In file included from mm/z3fold.c:33:
   In file included from include/linux/migrate.h:8:
>> include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   mm/z3fold.c:287:37: warning: unused function 'handle_to_z3fold_header' [-Wunused-function]
   static inline struct z3fold_header *handle_to_z3fold_header(unsigned long h)
                                       ^
   2 warnings generated.


vim +142 include/linux/hugetlb.h

   108	
   109	void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
   110	int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *);
   111	int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *,
   112			loff_t *);
   113	int hugetlb_treat_movable_handler(struct ctl_table *, int, void *, size_t *,
   114			loff_t *);
   115	int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void *, size_t *,
   116			loff_t *);
   117	
   118	int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
   119	long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
   120				 struct page **, struct vm_area_struct **,
   121				 unsigned long *, unsigned long *, long, unsigned int,
   122				 int *);
   123	void unmap_hugepage_range(struct vm_area_struct *,
   124				  unsigned long, unsigned long, struct page *);
   125	void __unmap_hugepage_range_final(struct mmu_gather *tlb,
   126				  struct vm_area_struct *vma,
   127				  unsigned long start, unsigned long end,
   128				  struct page *ref_page);
   129	void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
   130					unsigned long start, unsigned long end,
   131					struct page *ref_page);
   132	void hugetlb_report_meminfo(struct seq_file *);
   133	int hugetlb_report_node_meminfo(char *buf, int len, int nid);
   134	void hugetlb_show_meminfo(void);
   135	unsigned long hugetlb_total_pages(void);
   136	vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
   137				unsigned long address, unsigned int flags);
   138	int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
   139					struct vm_area_struct *dst_vma,
   140					unsigned long dst_addr,
   141					unsigned long src_addr,
 > 142					enum mcopy_atomic_mode mode,
   143					struct page **pagep);
   144	int hugetlb_reserve_pages(struct inode *inode, long from, long to,
   145							struct vm_area_struct *vma,
   146							vm_flags_t vm_flags);
   147	long hugetlb_unreserve_pages(struct inode *inode, long start, long end,
   148							long freed);
   149	bool isolate_huge_page(struct page *page, struct list_head *list);
   150	void putback_active_hugepage(struct page *page);
   151	void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason);
   152	void free_huge_page(struct page *page);
   153	void hugetlb_fix_reserve_counts(struct inode *inode);
   154	extern struct mutex *hugetlb_fault_mutex_table;
   155	u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
   156	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43402 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl
  2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
  2021-01-25  7:53   ` kernel test robot
  2021-01-25  8:31   ` kernel test robot
@ 2021-01-25 13:37   ` kernel test robot
  2021-01-25 17:50     ` Axel Rasmussen
  2 siblings, 1 reply; 14+ messages in thread
From: kernel test robot @ 2021-01-25 13:37 UTC (permalink / raw)
  To: Axel Rasmussen, Alexander Viro, Alexey Dobriyan,
	Andrea Arcangeli, Andrew Morton, Anshuman Khandual,
	Catalin Marinas, Chinwen Chang, Huang Ying, Ingo Molnar
  Cc: kbuild-all, clang-built-linux, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 14701 bytes --]

Hi Axel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on powerpc/next s390/features tip/perf/core linus/master v5.11-rc5 next-20210122]
[cannot apply to hp-parisc/for-next hnaz-linux-mm/master ia64/next sparc-next/master sparc/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: x86_64-randconfig-a013-20210125 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 12d0753aca22896fda2cf76781b0ee0524d55065)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/b8fb53c3a341b9b853aa3286286c807088311dbd
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
        git checkout b8fb53c3a341b9b853aa3286286c807088311dbd
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from mm/hugetlb.c:39:
   include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                                   enum mcopy_atomic_mode mode,
                                        ^
   mm/hugetlb.c:4659:13: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
                               enum mcopy_atomic_mode mode,
                                    ^
>> mm/hugetlb.c:4654:5: error: conflicting types for 'hugetlb_mcopy_atomic_pte'
   int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
       ^
   include/linux/hugetlb.h:138:5: note: previous declaration is here
   int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
       ^
>> mm/hugetlb.c:4659:31: error: variable has incomplete type 'enum mcopy_atomic_mode'
                               enum mcopy_atomic_mode mode,
                                                      ^
   mm/hugetlb.c:4659:13: note: forward declaration of 'enum mcopy_atomic_mode'
                               enum mcopy_atomic_mode mode,
                                    ^
>> mm/hugetlb.c:4675:25: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
                                  ^
   mm/hugetlb.c:4692:21: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           } else if (mode == MCOPY_ATOMIC_CONTINUE) {
                              ^
   mm/hugetlb.c:4711:27: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
                                    ^
   mm/hugetlb.c:4771:14: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
           if (mode != MCOPY_ATOMIC_CONTINUE)
                       ^
   2 warnings and 6 errors generated.


vim +/hugetlb_mcopy_atomic_pte +4654 mm/hugetlb.c

86e5216f8d8aa25 Adam Litke        2006-01-06  4649  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4650  /*
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4651   * Used by userfaultfd UFFDIO_COPY.  Based on mcopy_atomic_pte with
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4652   * modifications for huge pages.
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4653   */
8fb5debc5fcd450 Mike Kravetz      2017-02-22 @4654  int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4655  			    pte_t *dst_pte,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4656  			    struct vm_area_struct *dst_vma,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4657  			    unsigned long dst_addr,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4658  			    unsigned long src_addr,
b8fb53c3a341b9b Axel Rasmussen    2021-01-22 @4659  			    enum mcopy_atomic_mode mode,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4660  			    struct page **pagep)
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4661  {
1e3921471354244 Andrea Arcangeli  2017-11-02  4662  	struct address_space *mapping;
1e3921471354244 Andrea Arcangeli  2017-11-02  4663  	pgoff_t idx;
1e3921471354244 Andrea Arcangeli  2017-11-02  4664  	unsigned long size;
1c9e8def43a3452 Mike Kravetz      2017-02-22  4665  	int vm_shared = dst_vma->vm_flags & VM_SHARED;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4666  	struct hstate *h = hstate_vma(dst_vma);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4667  	pte_t _dst_pte;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4668  	spinlock_t *ptl;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4669  	int ret;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4670  	struct page *page;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4671  
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4672  	mapping = dst_vma->vm_file->f_mapping;
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4673  	idx = vma_hugecache_offset(h, dst_vma, dst_addr);
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4674  
b8fb53c3a341b9b Axel Rasmussen    2021-01-22 @4675  	if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4676  		ret = -ENOMEM;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4677  		page = alloc_huge_page(dst_vma, dst_addr, 0);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4678  		if (IS_ERR(page))
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4679  			goto out;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4680  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4681  		ret = copy_huge_page_from_user(page,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4682  						(const void __user *) src_addr,
810a56b943e265b Mike Kravetz      2017-02-22  4683  						pages_per_huge_page(h), false);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4684  
c1e8d7c6a7a682e Michel Lespinasse 2020-06-08  4685  		/* fallback to copy_from_user outside mmap_lock */
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4686  		if (unlikely(ret)) {
9e368259ad98835 Andrea Arcangeli  2018-11-30  4687  			ret = -ENOENT;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4688  			*pagep = page;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4689  			/* don't free the page */
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4690  			goto out;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4691  		}
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4692  	} else if (mode == MCOPY_ATOMIC_CONTINUE) {
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4693  		ret = -EFAULT;
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4694  		page = find_lock_page(mapping, idx);
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4695  		*pagep = NULL;
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4696  		if (!page)
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4697  			goto out;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4698  	} else {
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4699  		page = *pagep;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4700  		*pagep = NULL;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4701  	}
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4702  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4703  	/*
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4704  	 * The memory barrier inside __SetPageUptodate makes sure that
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4705  	 * preceding stores to the page contents become visible before
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4706  	 * the set_pte_at() write.
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4707  	 */
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4708  	__SetPageUptodate(page);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4709  
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4710  	/* Add shared, newly allocated pages to the page cache. */
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4711  	if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
1e3921471354244 Andrea Arcangeli  2017-11-02  4712  		size = i_size_read(mapping->host) >> huge_page_shift(h);
1e3921471354244 Andrea Arcangeli  2017-11-02  4713  		ret = -EFAULT;
1e3921471354244 Andrea Arcangeli  2017-11-02  4714  		if (idx >= size)
1e3921471354244 Andrea Arcangeli  2017-11-02  4715  			goto out_release_nounlock;
1c9e8def43a3452 Mike Kravetz      2017-02-22  4716  
1e3921471354244 Andrea Arcangeli  2017-11-02  4717  		/*
1e3921471354244 Andrea Arcangeli  2017-11-02  4718  		 * Serialization between remove_inode_hugepages() and
1e3921471354244 Andrea Arcangeli  2017-11-02  4719  		 * huge_add_to_page_cache() below happens through the
1e3921471354244 Andrea Arcangeli  2017-11-02  4720  		 * hugetlb_fault_mutex_table that here must be hold by
1e3921471354244 Andrea Arcangeli  2017-11-02  4721  		 * the caller.
1e3921471354244 Andrea Arcangeli  2017-11-02  4722  		 */
1c9e8def43a3452 Mike Kravetz      2017-02-22  4723  		ret = huge_add_to_page_cache(page, mapping, idx);
1c9e8def43a3452 Mike Kravetz      2017-02-22  4724  		if (ret)
1c9e8def43a3452 Mike Kravetz      2017-02-22  4725  			goto out_release_nounlock;
1c9e8def43a3452 Mike Kravetz      2017-02-22  4726  	}
1c9e8def43a3452 Mike Kravetz      2017-02-22  4727  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4728  	ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4729  	spin_lock(ptl);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4730  
1e3921471354244 Andrea Arcangeli  2017-11-02  4731  	/*
1e3921471354244 Andrea Arcangeli  2017-11-02  4732  	 * Recheck the i_size after holding PT lock to make sure not
1e3921471354244 Andrea Arcangeli  2017-11-02  4733  	 * to leave any page mapped (as page_mapped()) beyond the end
1e3921471354244 Andrea Arcangeli  2017-11-02  4734  	 * of the i_size (remove_inode_hugepages() is strict about
1e3921471354244 Andrea Arcangeli  2017-11-02  4735  	 * enforcing that). If we bail out here, we'll also leave a
1e3921471354244 Andrea Arcangeli  2017-11-02  4736  	 * page in the radix tree in the vm_shared case beyond the end
1e3921471354244 Andrea Arcangeli  2017-11-02  4737  	 * of the i_size, but remove_inode_hugepages() will take care
1e3921471354244 Andrea Arcangeli  2017-11-02  4738  	 * of it as soon as we drop the hugetlb_fault_mutex_table.
1e3921471354244 Andrea Arcangeli  2017-11-02  4739  	 */
1e3921471354244 Andrea Arcangeli  2017-11-02  4740  	size = i_size_read(mapping->host) >> huge_page_shift(h);
1e3921471354244 Andrea Arcangeli  2017-11-02  4741  	ret = -EFAULT;
1e3921471354244 Andrea Arcangeli  2017-11-02  4742  	if (idx >= size)
1e3921471354244 Andrea Arcangeli  2017-11-02  4743  		goto out_release_unlock;
1e3921471354244 Andrea Arcangeli  2017-11-02  4744  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4745  	ret = -EEXIST;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4746  	if (!huge_pte_none(huge_ptep_get(dst_pte)))
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4747  		goto out_release_unlock;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4748  
1c9e8def43a3452 Mike Kravetz      2017-02-22  4749  	if (vm_shared) {
1c9e8def43a3452 Mike Kravetz      2017-02-22  4750  		page_dup_rmap(page, true);
1c9e8def43a3452 Mike Kravetz      2017-02-22  4751  	} else {
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4752  		ClearPagePrivate(page);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4753  		hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
1c9e8def43a3452 Mike Kravetz      2017-02-22  4754  	}
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4755  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4756  	_dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4757  	if (dst_vma->vm_flags & VM_WRITE)
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4758  		_dst_pte = huge_pte_mkdirty(_dst_pte);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4759  	_dst_pte = pte_mkyoung(_dst_pte);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4760  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4761  	set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4762  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4763  	(void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4764  					dst_vma->vm_flags & VM_WRITE);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4765  	hugetlb_count_add(pages_per_huge_page(h), dst_mm);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4766  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4767  	/* No need to invalidate - it was non-present before */
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4768  	update_mmu_cache(dst_vma, dst_addr, dst_pte);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4769  
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4770  	spin_unlock(ptl);
b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4771  	if (mode != MCOPY_ATOMIC_CONTINUE)
cb6acd01e2e43fd Mike Kravetz      2019-02-28  4772  		set_page_huge_active(page);
1c9e8def43a3452 Mike Kravetz      2017-02-22  4773  	if (vm_shared)
1c9e8def43a3452 Mike Kravetz      2017-02-22  4774  		unlock_page(page);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4775  	ret = 0;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4776  out:
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4777  	return ret;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4778  out_release_unlock:
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4779  	spin_unlock(ptl);
1c9e8def43a3452 Mike Kravetz      2017-02-22  4780  	if (vm_shared)
1c9e8def43a3452 Mike Kravetz      2017-02-22  4781  		unlock_page(page);
5af10dfd0afc559 Andrea Arcangeli  2017-08-10  4782  out_release_nounlock:
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4783  	put_page(page);
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4784  	goto out;
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4785  }
8fb5debc5fcd450 Mike Kravetz      2017-02-22  4786  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 43402 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl
  2021-01-25 13:37   ` kernel test robot
@ 2021-01-25 17:50     ` Axel Rasmussen
  0 siblings, 0 replies; 14+ messages in thread
From: Axel Rasmussen @ 2021-01-25 17:50 UTC (permalink / raw)
  Cc: Alexander Viro, Alexey Dobriyan, Andrea Arcangeli, Andrew Morton,
	Anshuman Khandual, Catalin Marinas, Chinwen Chang, Huang Ying,
	Ingo Molnar, kbuild-all, clang-built-linux,
	Linux Memory Management List

This build error seems to be caused by a missing #ifdef
CONFIG_USERFAULTFD. I'll send a v3 with this fix, after waiting for
other feedback on the v2 version.

On Mon, Jan 25, 2021 at 5:37 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Axel,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on powerpc/next s390/features tip/perf/core linus/master v5.11-rc5 next-20210122]
> [cannot apply to hp-parisc/for-next hnaz-linux-mm/master ia64/next sparc-next/master sparc/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> config: x86_64-randconfig-a013-20210125 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 12d0753aca22896fda2cf76781b0ee0524d55065)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install x86_64 cross compiling tool for clang build
>         # apt-get install binutils-x86-64-linux-gnu
>         # https://github.com/0day-ci/linux/commit/b8fb53c3a341b9b853aa3286286c807088311dbd
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Axel-Rasmussen/userfaultfd-add-minor-fault-handling/20210125-104035
>         git checkout b8fb53c3a341b9b853aa3286286c807088311dbd
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    In file included from mm/hugetlb.c:39:
>    include/linux/hugetlb.h:142:10: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
>                                    enum mcopy_atomic_mode mode,
>                                         ^
>    mm/hugetlb.c:4659:13: warning: declaration of 'enum mcopy_atomic_mode' will not be visible outside of this function [-Wvisibility]
>                                enum mcopy_atomic_mode mode,
>                                     ^
> >> mm/hugetlb.c:4654:5: error: conflicting types for 'hugetlb_mcopy_atomic_pte'
>    int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>        ^
>    include/linux/hugetlb.h:138:5: note: previous declaration is here
>    int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
>        ^
> >> mm/hugetlb.c:4659:31: error: variable has incomplete type 'enum mcopy_atomic_mode'
>                                enum mcopy_atomic_mode mode,
>                                                       ^
>    mm/hugetlb.c:4659:13: note: forward declaration of 'enum mcopy_atomic_mode'
>                                enum mcopy_atomic_mode mode,
>                                     ^
> >> mm/hugetlb.c:4675:25: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
>            if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
>                                   ^
>    mm/hugetlb.c:4692:21: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
>            } else if (mode == MCOPY_ATOMIC_CONTINUE) {
>                               ^
>    mm/hugetlb.c:4711:27: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
>            if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
>                                     ^
>    mm/hugetlb.c:4771:14: error: use of undeclared identifier 'MCOPY_ATOMIC_CONTINUE'
>            if (mode != MCOPY_ATOMIC_CONTINUE)
>                        ^
>    2 warnings and 6 errors generated.
>
>
> vim +/hugetlb_mcopy_atomic_pte +4654 mm/hugetlb.c
>
> 86e5216f8d8aa25 Adam Litke        2006-01-06  4649
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4650  /*
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4651   * Used by userfaultfd UFFDIO_COPY.  Based on mcopy_atomic_pte with
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4652   * modifications for huge pages.
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4653   */
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22 @4654  int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4655                          pte_t *dst_pte,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4656                          struct vm_area_struct *dst_vma,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4657                          unsigned long dst_addr,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4658                          unsigned long src_addr,
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22 @4659                          enum mcopy_atomic_mode mode,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4660                          struct page **pagep)
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4661  {
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4662      struct address_space *mapping;
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4663      pgoff_t idx;
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4664      unsigned long size;
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4665      int vm_shared = dst_vma->vm_flags & VM_SHARED;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4666      struct hstate *h = hstate_vma(dst_vma);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4667      pte_t _dst_pte;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4668      spinlock_t *ptl;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4669      int ret;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4670      struct page *page;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4671
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4672      mapping = dst_vma->vm_file->f_mapping;
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4673      idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4674
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22 @4675      if (!*pagep && mode != MCOPY_ATOMIC_CONTINUE) {
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4676              ret = -ENOMEM;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4677              page = alloc_huge_page(dst_vma, dst_addr, 0);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4678              if (IS_ERR(page))
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4679                      goto out;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4680
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4681              ret = copy_huge_page_from_user(page,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4682                                              (const void __user *) src_addr,
> 810a56b943e265b Mike Kravetz      2017-02-22  4683                                              pages_per_huge_page(h), false);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4684
> c1e8d7c6a7a682e Michel Lespinasse 2020-06-08  4685              /* fallback to copy_from_user outside mmap_lock */
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4686              if (unlikely(ret)) {
> 9e368259ad98835 Andrea Arcangeli  2018-11-30  4687                      ret = -ENOENT;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4688                      *pagep = page;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4689                      /* don't free the page */
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4690                      goto out;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4691              }
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4692      } else if (mode == MCOPY_ATOMIC_CONTINUE) {
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4693              ret = -EFAULT;
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4694              page = find_lock_page(mapping, idx);
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4695              *pagep = NULL;
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4696              if (!page)
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4697                      goto out;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4698      } else {
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4699              page = *pagep;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4700              *pagep = NULL;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4701      }
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4702
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4703      /*
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4704       * The memory barrier inside __SetPageUptodate makes sure that
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4705       * preceding stores to the page contents become visible before
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4706       * the set_pte_at() write.
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4707       */
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4708      __SetPageUptodate(page);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4709
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4710      /* Add shared, newly allocated pages to the page cache. */
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4711      if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) {
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4712              size = i_size_read(mapping->host) >> huge_page_shift(h);
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4713              ret = -EFAULT;
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4714              if (idx >= size)
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4715                      goto out_release_nounlock;
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4716
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4717              /*
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4718               * Serialization between remove_inode_hugepages() and
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4719               * huge_add_to_page_cache() below happens through the
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4720               * hugetlb_fault_mutex_table that here must be hold by
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4721               * the caller.
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4722               */
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4723              ret = huge_add_to_page_cache(page, mapping, idx);
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4724              if (ret)
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4725                      goto out_release_nounlock;
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4726      }
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4727
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4728      ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4729      spin_lock(ptl);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4730
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4731      /*
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4732       * Recheck the i_size after holding PT lock to make sure not
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4733       * to leave any page mapped (as page_mapped()) beyond the end
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4734       * of the i_size (remove_inode_hugepages() is strict about
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4735       * enforcing that). If we bail out here, we'll also leave a
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4736       * page in the radix tree in the vm_shared case beyond the end
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4737       * of the i_size, but remove_inode_hugepages() will take care
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4738       * of it as soon as we drop the hugetlb_fault_mutex_table.
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4739       */
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4740      size = i_size_read(mapping->host) >> huge_page_shift(h);
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4741      ret = -EFAULT;
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4742      if (idx >= size)
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4743              goto out_release_unlock;
> 1e3921471354244 Andrea Arcangeli  2017-11-02  4744
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4745      ret = -EEXIST;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4746      if (!huge_pte_none(huge_ptep_get(dst_pte)))
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4747              goto out_release_unlock;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4748
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4749      if (vm_shared) {
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4750              page_dup_rmap(page, true);
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4751      } else {
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4752              ClearPagePrivate(page);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4753              hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4754      }
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4755
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4756      _dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4757      if (dst_vma->vm_flags & VM_WRITE)
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4758              _dst_pte = huge_pte_mkdirty(_dst_pte);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4759      _dst_pte = pte_mkyoung(_dst_pte);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4760
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4761      set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4762
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4763      (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte,
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4764                                      dst_vma->vm_flags & VM_WRITE);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4765      hugetlb_count_add(pages_per_huge_page(h), dst_mm);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4766
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4767      /* No need to invalidate - it was non-present before */
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4768      update_mmu_cache(dst_vma, dst_addr, dst_pte);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4769
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4770      spin_unlock(ptl);
> b8fb53c3a341b9b Axel Rasmussen    2021-01-22  4771      if (mode != MCOPY_ATOMIC_CONTINUE)
> cb6acd01e2e43fd Mike Kravetz      2019-02-28  4772              set_page_huge_active(page);
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4773      if (vm_shared)
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4774              unlock_page(page);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4775      ret = 0;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4776  out:
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4777      return ret;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4778  out_release_unlock:
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4779      spin_unlock(ptl);
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4780      if (vm_shared)
> 1c9e8def43a3452 Mike Kravetz      2017-02-22  4781              unlock_page(page);
> 5af10dfd0afc559 Andrea Arcangeli  2017-08-10  4782  out_release_nounlock:
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4783      put_page(page);
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4784      goto out;
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4785  }
> 8fb5debc5fcd450 Mike Kravetz      2017-02-22  4786
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-25 17:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 21:29 [PATCH 0/9] userfaultfd: add minor fault handling Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 5/9] userfaultfd: add minor fault registration mode Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
2021-01-25  7:53   ` kernel test robot
2021-01-25  8:31   ` kernel test robot
2021-01-25 13:37   ` kernel test robot
2021-01-25 17:50     ` Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 8/9] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
2021-01-22 21:29 ` [PATCH v2 9/9] userfaultfd/selftests: add test exercising " Axel Rasmussen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).