All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/10] userfaultfd: add minor fault handling for shmem
@ 2021-04-27 22:52 ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Base
====

This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:

https://lore.kernel.org/patchwork/cover/1412450/

Changelog
=========

v4->v5:
- Picked up {Reviewed,Acked}-by's.
- Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter]
- Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh]
- Split + reorder commits, so now we 1) implement the faulting path, 2)
  implement the CONTINUE ioctl, and 3) advertise the feature. Squash the
  documentation update into step (3). [Hugh, Peter]
- Reorder install_pte() cleanup to come before selftest changes. [Hugh]

v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
  assumed that !vma_is_anonymous() was equivalent to "the page will be in the
  page cache". But, in this case we have an optimization where we allocate a new
  *anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
  page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
  just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
  use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]

v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
  return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]

v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
  swapped in, and just immediately fire the minor fault. Let a future CONTINUE
  deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]

Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
  easier, as we no longer have to sift through deltas undoing what we had done
  before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
  helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
  for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
  of some parameters, simplify labels/gotos, ...). [Hugh, Peter]

Overview
========

See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
  fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
  helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.

Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope to
optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t

Axel Rasmussen (10):
  userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
  userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
  userfaultfd/shmem: support minor fault registration for shmem
  userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  userfaultfd/shmem: advertise shmem minor fault support
  userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  userfaultfd/selftests: use memfd_create for shmem test type
  userfaultfd/selftests: create alias mappings in the shmem test
  userfaultfd/selftests: reinitialize test context in each test
  userfaultfd/selftests: exercise minor fault handling shmem support

 Documentation/admin-guide/mm/userfaultfd.rst |   3 +-
 fs/userfaultfd.c                             |   6 +-
 include/linux/hugetlb.h                      |   4 +-
 include/linux/shmem_fs.h                     |  17 +-
 include/linux/userfaultfd_k.h                |   5 +
 include/uapi/linux/userfaultfd.h             |   7 +-
 mm/hugetlb.c                                 |   1 +
 mm/memory.c                                  |   8 +-
 mm/shmem.c                                   | 110 +++-----
 mm/userfaultfd.c                             | 175 ++++++++----
 tools/testing/selftests/vm/userfaultfd.c     | 274 ++++++++++++-------
 11 files changed, 360 insertions(+), 250 deletions(-)

--
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 00/10] userfaultfd: add minor fault handling for shmem
@ 2021-04-27 22:52 ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Base
====

This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:

https://lore.kernel.org/patchwork/cover/1412450/

Changelog
=========

v4->v5:
- Picked up {Reviewed,Acked}-by's.
- Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter]
- Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh]
- Split + reorder commits, so now we 1) implement the faulting path, 2)
  implement the CONTINUE ioctl, and 3) advertise the feature. Squash the
  documentation update into step (3). [Hugh, Peter]
- Reorder install_pte() cleanup to come before selftest changes. [Hugh]

v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
  assumed that !vma_is_anonymous() was equivalent to "the page will be in the
  page cache". But, in this case we have an optimization where we allocate a new
  *anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
  page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
  just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
  use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]

v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
  return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]

v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
  swapped in, and just immediately fire the minor fault. Let a future CONTINUE
  deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]

Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
  easier, as we no longer have to sift through deltas undoing what we had done
  before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
  helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
  for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
  of some parameters, simplify labels/gotos, ...). [Hugh, Peter]

Overview
========

See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
  fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
  helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.

Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope to
optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t

Axel Rasmussen (10):
  userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
  userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
  userfaultfd/shmem: support minor fault registration for shmem
  userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  userfaultfd/shmem: advertise shmem minor fault support
  userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  userfaultfd/selftests: use memfd_create for shmem test type
  userfaultfd/selftests: create alias mappings in the shmem test
  userfaultfd/selftests: reinitialize test context in each test
  userfaultfd/selftests: exercise minor fault handling shmem support

 Documentation/admin-guide/mm/userfaultfd.rst |   3 +-
 fs/userfaultfd.c                             |   6 +-
 include/linux/hugetlb.h                      |   4 +-
 include/linux/shmem_fs.h                     |  17 +-
 include/linux/userfaultfd_k.h                |   5 +
 include/uapi/linux/userfaultfd.h             |   7 +-
 mm/hugetlb.c                                 |   1 +
 mm/memory.c                                  |   8 +-
 mm/shmem.c                                   | 110 +++-----
 mm/userfaultfd.c                             | 175 ++++++++----
 tools/testing/selftests/vm/userfaultfd.c     | 274 ++++++++++++-------
 11 files changed, 360 insertions(+), 250 deletions(-)

--
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 01/10] userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Minimizing header file inclusion is desirable. In this case, we can do
so just by forward declaring the enumeration our signature relies upon.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/hugetlb.h | 4 +++-
 mm/hugetlb.c            | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 09f1fd12a6fa..ca8868cdac16 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -11,11 +11,11 @@
 #include <linux/kref.h>
 #include <linux/pgtable.h>
 #include <linux/gfp.h>
-#include <linux/userfaultfd_k.h>
 
 struct ctl_table;
 struct user_struct;
 struct mmu_gather;
+enum mcopy_atomic_mode;
 
 #ifndef is_hugepd
 typedef struct { unsigned long pd; } hugepd_t;
@@ -135,6 +135,7 @@ void hugetlb_show_meminfo(void);
 unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
+
 #ifdef CONFIG_USERFAULTFD
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				struct vm_area_struct *dst_vma,
@@ -143,6 +144,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				enum mcopy_atomic_mode mode,
 				struct page **pagep);
 #endif /* CONFIG_USERFAULTFD */
+
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
 						vm_flags_t vm_flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 54d81d5947ed..b1652e747318 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -40,6 +40,7 @@
 #include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include <linux/page_owner.h>
+#include <linux/userfaultfd_k.h>
 #include "internal.h"
 
 int hugetlb_max_hstate __read_mostly;
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 01/10] userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Minimizing header file inclusion is desirable. In this case, we can do
so just by forward declaring the enumeration our signature relies upon.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/hugetlb.h | 4 +++-
 mm/hugetlb.c            | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 09f1fd12a6fa..ca8868cdac16 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -11,11 +11,11 @@
 #include <linux/kref.h>
 #include <linux/pgtable.h>
 #include <linux/gfp.h>
-#include <linux/userfaultfd_k.h>
 
 struct ctl_table;
 struct user_struct;
 struct mmu_gather;
+enum mcopy_atomic_mode;
 
 #ifndef is_hugepd
 typedef struct { unsigned long pd; } hugepd_t;
@@ -135,6 +135,7 @@ void hugetlb_show_meminfo(void);
 unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
+
 #ifdef CONFIG_USERFAULTFD
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				struct vm_area_struct *dst_vma,
@@ -143,6 +144,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
 				enum mcopy_atomic_mode mode,
 				struct page **pagep);
 #endif /* CONFIG_USERFAULTFD */
+
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
 						struct vm_area_struct *vma,
 						vm_flags_t vm_flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 54d81d5947ed..b1652e747318 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -40,6 +40,7 @@
 #include <linux/hugetlb_cgroup.h>
 #include <linux/node.h>
 #include <linux/page_owner.h>
+#include <linux/userfaultfd_k.h>
 #include "internal.h"
 
 int hugetlb_max_hstate __read_mostly;
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 02/10] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Previously, we did a dance where we had one calling path in
userfaultfd.c (mfill_atomic_pte), but then we split it into two in
shmem_fs.h (shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined
into a single shared function in shmem.c (shmem_mfill_atomic_pte).

This is all a bit overly complex. Just call the single combined shmem
function directly, allowing us to clean up various branches,
boilerplate, etc.

While we're touching this function, two other small cleanup changes:
- offset is equivalent to pgoff, so we can get rid of offset entirely.
- Split two VM_BUG_ON cases into two statements. This means the line
  number reported when the BUG is hit specifies exactly which condition
  was true.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/shmem_fs.h | 17 ++++++-------
 mm/shmem.c               | 52 +++++++++++++---------------------------
 mm/userfaultfd.c         | 10 +++-----
 3 files changed, 26 insertions(+), 53 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d82b6f396588..47c3409d02ac 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -122,21 +122,18 @@ static inline bool shmem_file(struct file *file)
 extern bool shmem_charge(struct inode *inode, long pages);
 extern void shmem_uncharge(struct inode *inode, long pages);
 
+#ifdef CONFIG_USERFAULTFD
 #ifdef CONFIG_SHMEM
 extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 				  struct vm_area_struct *dst_vma,
 				  unsigned long dst_addr,
 				  unsigned long src_addr,
+				  bool zeropage,
 				  struct page **pagep);
-extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-				    pmd_t *dst_pmd,
-				    struct vm_area_struct *dst_vma,
-				    unsigned long dst_addr);
-#else
-#define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
-			       src_addr, pagep)        ({ BUG(); 0; })
-#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \
-				 dst_addr)      ({ BUG(); 0; })
-#endif
+#else /* !CONFIG_SHMEM */
+#define shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
+			       src_addr, zeropage, pagep)       ({ BUG(); 0; })
+#endif /* CONFIG_SHMEM */
+#endif /* CONFIG_USERFAULTFD */
 
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index 26c76b13ad23..b72c55aa07fc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2354,13 +2354,14 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	return inode;
 }
 
-static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
-				  pmd_t *dst_pmd,
-				  struct vm_area_struct *dst_vma,
-				  unsigned long dst_addr,
-				  unsigned long src_addr,
-				  bool zeropage,
-				  struct page **pagep)
+#ifdef CONFIG_USERFAULTFD
+int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
+			   pmd_t *dst_pmd,
+			   struct vm_area_struct *dst_vma,
+			   unsigned long dst_addr,
+			   unsigned long src_addr,
+			   bool zeropage,
+			   struct page **pagep)
 {
 	struct inode *inode = file_inode(dst_vma->vm_file);
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2372,7 +2373,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	struct page *page;
 	pte_t _dst_pte, *dst_pte;
 	int ret;
-	pgoff_t offset, max_off;
+	pgoff_t max_off;
 
 	ret = -ENOMEM;
 	if (!shmem_inode_acct_block(inode, 1))
@@ -2383,7 +2384,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 		if (!page)
 			goto out_unacct_blocks;
 
-		if (!zeropage) {	/* mcopy_atomic */
+		if (!zeropage) {	/* COPY */
 			page_kaddr = kmap_atomic(page);
 			ret = copy_from_user(page_kaddr,
 					     (const void __user *)src_addr,
@@ -2397,7 +2398,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 				/* don't free the page */
 				return -ENOENT;
 			}
-		} else {		/* mfill_zeropage_atomic */
+		} else {		/* ZEROPAGE */
 			clear_highpage(page);
 		}
 	} else {
@@ -2405,15 +2406,15 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 		*pagep = NULL;
 	}
 
-	VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
+	VM_BUG_ON(PageLocked(page));
+	VM_BUG_ON(PageSwapBacked(page));
 	__SetPageLocked(page);
 	__SetPageSwapBacked(page);
 	__SetPageUptodate(page);
 
 	ret = -EFAULT;
-	offset = linear_page_index(dst_vma, dst_addr);
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(offset >= max_off))
+	if (unlikely(pgoff >= max_off))
 		goto out_release;
 
 	ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
@@ -2439,7 +2440,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 
 	ret = -EFAULT;
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(offset >= max_off))
+	if (unlikely(pgoff >= max_off))
 		goto out_release_unlock;
 
 	ret = -EEXIST;
@@ -2476,28 +2477,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	shmem_inode_unacct_blocks(inode, 1);
 	goto out;
 }
-
-int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
-			   pmd_t *dst_pmd,
-			   struct vm_area_struct *dst_vma,
-			   unsigned long dst_addr,
-			   unsigned long src_addr,
-			   struct page **pagep)
-{
-	return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
-				      dst_addr, src_addr, false, pagep);
-}
-
-int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-			     pmd_t *dst_pmd,
-			     struct vm_area_struct *dst_vma,
-			     unsigned long dst_addr)
-{
-	struct page *page = NULL;
-
-	return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
-				      dst_addr, 0, true, &page);
-}
+#endif /* CONFIG_USERFAULTFD */
 
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e14b3820c6a8..23fa2583bbd1 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -440,13 +440,9 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 						 dst_vma, dst_addr);
 	} else {
 		VM_WARN_ON_ONCE(wp_copy);
-		if (!zeropage)
-			err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd,
-						     dst_vma, dst_addr,
-						     src_addr, page);
-		else
-			err = shmem_mfill_zeropage_pte(dst_mm, dst_pmd,
-						       dst_vma, dst_addr);
+		err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
+					     dst_addr, src_addr, zeropage,
+					     page);
 	}
 
 	return err;
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 02/10] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Previously, we did a dance where we had one calling path in
userfaultfd.c (mfill_atomic_pte), but then we split it into two in
shmem_fs.h (shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined
into a single shared function in shmem.c (shmem_mfill_atomic_pte).

This is all a bit overly complex. Just call the single combined shmem
function directly, allowing us to clean up various branches,
boilerplate, etc.

While we're touching this function, two other small cleanup changes:
- offset is equivalent to pgoff, so we can get rid of offset entirely.
- Split two VM_BUG_ON cases into two statements. This means the line
  number reported when the BUG is hit specifies exactly which condition
  was true.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/shmem_fs.h | 17 ++++++-------
 mm/shmem.c               | 52 +++++++++++++---------------------------
 mm/userfaultfd.c         | 10 +++-----
 3 files changed, 26 insertions(+), 53 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d82b6f396588..47c3409d02ac 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -122,21 +122,18 @@ static inline bool shmem_file(struct file *file)
 extern bool shmem_charge(struct inode *inode, long pages);
 extern void shmem_uncharge(struct inode *inode, long pages);
 
+#ifdef CONFIG_USERFAULTFD
 #ifdef CONFIG_SHMEM
 extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 				  struct vm_area_struct *dst_vma,
 				  unsigned long dst_addr,
 				  unsigned long src_addr,
+				  bool zeropage,
 				  struct page **pagep);
-extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-				    pmd_t *dst_pmd,
-				    struct vm_area_struct *dst_vma,
-				    unsigned long dst_addr);
-#else
-#define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
-			       src_addr, pagep)        ({ BUG(); 0; })
-#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \
-				 dst_addr)      ({ BUG(); 0; })
-#endif
+#else /* !CONFIG_SHMEM */
+#define shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \
+			       src_addr, zeropage, pagep)       ({ BUG(); 0; })
+#endif /* CONFIG_SHMEM */
+#endif /* CONFIG_USERFAULTFD */
 
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index 26c76b13ad23..b72c55aa07fc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2354,13 +2354,14 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
 	return inode;
 }
 
-static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
-				  pmd_t *dst_pmd,
-				  struct vm_area_struct *dst_vma,
-				  unsigned long dst_addr,
-				  unsigned long src_addr,
-				  bool zeropage,
-				  struct page **pagep)
+#ifdef CONFIG_USERFAULTFD
+int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
+			   pmd_t *dst_pmd,
+			   struct vm_area_struct *dst_vma,
+			   unsigned long dst_addr,
+			   unsigned long src_addr,
+			   bool zeropage,
+			   struct page **pagep)
 {
 	struct inode *inode = file_inode(dst_vma->vm_file);
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2372,7 +2373,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	struct page *page;
 	pte_t _dst_pte, *dst_pte;
 	int ret;
-	pgoff_t offset, max_off;
+	pgoff_t max_off;
 
 	ret = -ENOMEM;
 	if (!shmem_inode_acct_block(inode, 1))
@@ -2383,7 +2384,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 		if (!page)
 			goto out_unacct_blocks;
 
-		if (!zeropage) {	/* mcopy_atomic */
+		if (!zeropage) {	/* COPY */
 			page_kaddr = kmap_atomic(page);
 			ret = copy_from_user(page_kaddr,
 					     (const void __user *)src_addr,
@@ -2397,7 +2398,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 				/* don't free the page */
 				return -ENOENT;
 			}
-		} else {		/* mfill_zeropage_atomic */
+		} else {		/* ZEROPAGE */
 			clear_highpage(page);
 		}
 	} else {
@@ -2405,15 +2406,15 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 		*pagep = NULL;
 	}
 
-	VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
+	VM_BUG_ON(PageLocked(page));
+	VM_BUG_ON(PageSwapBacked(page));
 	__SetPageLocked(page);
 	__SetPageSwapBacked(page);
 	__SetPageUptodate(page);
 
 	ret = -EFAULT;
-	offset = linear_page_index(dst_vma, dst_addr);
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(offset >= max_off))
+	if (unlikely(pgoff >= max_off))
 		goto out_release;
 
 	ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
@@ -2439,7 +2440,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 
 	ret = -EFAULT;
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(offset >= max_off))
+	if (unlikely(pgoff >= max_off))
 		goto out_release_unlock;
 
 	ret = -EEXIST;
@@ -2476,28 +2477,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	shmem_inode_unacct_blocks(inode, 1);
 	goto out;
 }
-
-int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
-			   pmd_t *dst_pmd,
-			   struct vm_area_struct *dst_vma,
-			   unsigned long dst_addr,
-			   unsigned long src_addr,
-			   struct page **pagep)
-{
-	return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
-				      dst_addr, src_addr, false, pagep);
-}
-
-int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-			     pmd_t *dst_pmd,
-			     struct vm_area_struct *dst_vma,
-			     unsigned long dst_addr)
-{
-	struct page *page = NULL;
-
-	return shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma,
-				      dst_addr, 0, true, &page);
-}
+#endif /* CONFIG_USERFAULTFD */
 
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e14b3820c6a8..23fa2583bbd1 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -440,13 +440,9 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 						 dst_vma, dst_addr);
 	} else {
 		VM_WARN_ON_ONCE(wp_copy);
-		if (!zeropage)
-			err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd,
-						     dst_vma, dst_addr,
-						     src_addr, page);
-		else
-			err = shmem_mfill_zeropage_pte(dst_mm, dst_pmd,
-						       dst_vma, dst_addr);
+		err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
+					     dst_addr, src_addr, zeropage,
+					     page);
 	}
 
 	return err;
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 03/10] userfaultfd/shmem: support minor fault registration for shmem
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path,
for VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve
such faults.

Because of this, we also don't yet advertise this as a supported
feature. That will be done in a separate commit when the feature is
fully implemented.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/userfaultfd.c |  3 +--
 mm/memory.c      |  8 +++++---
 mm/shmem.c       | 12 +++++++++++-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f92285d04f..468556fb04a9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
 	}
 
 	if (vm_flags & VM_UFFD_MINOR) {
-		/* FIXME: Add minor fault interception for shmem. */
-		if (!is_vm_hugetlb_page(vma))
+		if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
 			return false;
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 4e358601c5d6..cc71a445c76c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3972,9 +3972,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(vmf);
-		if (ret)
-			return ret;
+		if (likely(!userfaultfd_minor(vmf->vma))) {
+			ret = do_fault_around(vmf);
+			if (ret)
+				return ret;
+		}
 	}
 
 	ret = __do_fault(vmf);
diff --git a/mm/shmem.c b/mm/shmem.c
index b72c55aa07fc..30c0bb501dc9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache.
  *
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
  * otherwise they are NULL.
  */
 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
@@ -1820,6 +1820,16 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 
 	page = pagecache_get_page(mapping, index,
 					FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
+
+	if (page && vma && userfaultfd_minor(vma)) {
+		if (!xa_is_value(page)) {
+			unlock_page(page);
+			put_page(page);
+		}
+		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+		return 0;
+	}
+
 	if (xa_is_value(page)) {
 		error = shmem_swapin_page(inode, index, &page,
 					  sgp, gfp, vma, fault_type);
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 03/10] userfaultfd/shmem: support minor fault registration for shmem
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path,
for VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve
such faults.

Because of this, we also don't yet advertise this as a supported
feature. That will be done in a separate commit when the feature is
fully implemented.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 fs/userfaultfd.c |  3 +--
 mm/memory.c      |  8 +++++---
 mm/shmem.c       | 12 +++++++++++-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f92285d04f..468556fb04a9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
 	}
 
 	if (vm_flags & VM_UFFD_MINOR) {
-		/* FIXME: Add minor fault interception for shmem. */
-		if (!is_vm_hugetlb_page(vma))
+		if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
 			return false;
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 4e358601c5d6..cc71a445c76c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3972,9 +3972,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 	 * something).
 	 */
 	if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-		ret = do_fault_around(vmf);
-		if (ret)
-			return ret;
+		if (likely(!userfaultfd_minor(vmf->vma))) {
+			ret = do_fault_around(vmf);
+			if (ret)
+				return ret;
+		}
 	}
 
 	ret = __do_fault(vmf);
diff --git a/mm/shmem.c b/mm/shmem.c
index b72c55aa07fc..30c0bb501dc9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache.
  *
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
  * otherwise they are NULL.
  */
 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
@@ -1820,6 +1820,16 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 
 	page = pagecache_get_page(mapping, index,
 					FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
+
+	if (page && vma && userfaultfd_minor(vma)) {
+		if (!xa_is_value(page)) {
+			unlock_page(page);
+			put_page(page);
+		}
+		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+		return 0;
+	}
+
 	if (xa_is_value(page)) {
 		error = shmem_swapin_page(inode, index, &page,
 					  sgp, gfp, vma, fault_type);
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install a PTE for it.

This commit introduces a new helper: mcopy_atomic_install_pte.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c? The existing userfault implementation only relies on shmem.c
for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
shmem in one place, regardless of shared/private (to reduce code
duplication).

Why add a new mcopy_atomic_install_pte helper? A problem we have with
continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly. We do want to setup the PTEs
in a CONTINUE operation, but we don't want to e.g. allocate a new page,
charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
we have the problem stated above: shmem_mcopy_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared /
private) continue cares about. So, introduce mcontinue_atomic_pte(), to
handle all of the shmem continue cases. Introduce the helper so it
doesn't duplicate code with mcopy_atomic_pte().

In a future commit, shmem_mcopy_atomic_pte() will also be modified to
use this new helper. However, since this is a bigger refactor, it seems
most clear to do it as a separate change.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 mm/userfaultfd.c | 172 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 127 insertions(+), 45 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 23fa2583bbd1..51d8c0127161 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -48,6 +48,83 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
 	return dst_vma;
 }
 
+/*
+ * Install PTEs, to map dst_addr (within dst_vma) to page.
+ *
+ * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
+ * whether or not dst_vma is VM_SHARED. It also handles the more general
+ * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
+ * backed, or not).
+ *
+ * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
+ * shmem_mcopy_atomic_pte instead.
+ */
+static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+				    struct vm_area_struct *dst_vma,
+				    unsigned long dst_addr, struct page *page,
+				    bool newly_allocated, bool wp_copy)
+{
+	int ret;
+	pte_t _dst_pte, *dst_pte;
+	bool writable = dst_vma->vm_flags & VM_WRITE;
+	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
+	bool page_in_cache = page->mapping;
+	spinlock_t *ptl;
+	struct inode *inode;
+	pgoff_t offset, max_off;
+
+	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+	if (page_in_cache && !vm_shared)
+		writable = false;
+	if (writable || !page_in_cache)
+		_dst_pte = pte_mkdirty(_dst_pte);
+	if (writable) {
+		if (wp_copy)
+			_dst_pte = pte_mkuffd_wp(_dst_pte);
+		else
+			_dst_pte = pte_mkwrite(_dst_pte);
+	}
+
+	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+
+	if (vma_is_shmem(dst_vma)) {
+		/* serialize against truncate with the page table lock */
+		inode = dst_vma->vm_file->f_inode;
+		offset = linear_page_index(dst_vma, dst_addr);
+		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+		ret = -EFAULT;
+		if (unlikely(offset >= max_off))
+			goto out_unlock;
+	}
+
+	ret = -EEXIST;
+	if (!pte_none(*dst_pte))
+		goto out_unlock;
+
+	if (page_in_cache)
+		page_add_file_rmap(page, false);
+	else
+		page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
+
+	/*
+	 * Must happen after rmap, as mm_counter() checks mapping (via
+	 * PageAnon()), which is set by __page_set_anon_rmap().
+	 */
+	inc_mm_counter(dst_mm, mm_counter(page));
+
+	if (newly_allocated)
+		lru_cache_add_inactive_or_unevictable(page, dst_vma);
+
+	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+	/* No need to invalidate - it was non-present before */
+	update_mmu_cache(dst_vma, dst_addr, dst_pte);
+	ret = 0;
+out_unlock:
+	pte_unmap_unlock(dst_pte, ptl);
+	return ret;
+}
+
 static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    pmd_t *dst_pmd,
 			    struct vm_area_struct *dst_vma,
@@ -56,13 +133,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    struct page **pagep,
 			    bool wp_copy)
 {
-	pte_t _dst_pte, *dst_pte;
-	spinlock_t *ptl;
 	void *page_kaddr;
 	int ret;
 	struct page *page;
-	pgoff_t offset, max_off;
-	struct inode *inode;
 
 	if (!*pagep) {
 		ret = -ENOMEM;
@@ -99,43 +172,12 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL))
 		goto out_release;
 
-	_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
-	if (dst_vma->vm_flags & VM_WRITE) {
-		if (wp_copy)
-			_dst_pte = pte_mkuffd_wp(_dst_pte);
-		else
-			_dst_pte = pte_mkwrite(_dst_pte);
-	}
-
-	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
-	if (dst_vma->vm_file) {
-		/* the shmem MAP_PRIVATE case requires checking the i_size */
-		inode = dst_vma->vm_file->f_inode;
-		offset = linear_page_index(dst_vma, dst_addr);
-		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-		ret = -EFAULT;
-		if (unlikely(offset >= max_off))
-			goto out_release_uncharge_unlock;
-	}
-	ret = -EEXIST;
-	if (!pte_none(*dst_pte))
-		goto out_release_uncharge_unlock;
-
-	inc_mm_counter(dst_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
-	lru_cache_add_inactive_or_unevictable(page, dst_vma);
-
-	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(dst_vma, dst_addr, dst_pte);
-
-	pte_unmap_unlock(dst_pte, ptl);
-	ret = 0;
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, true, wp_copy);
+	if (ret)
+		goto out_release;
 out:
 	return ret;
-out_release_uncharge_unlock:
-	pte_unmap_unlock(dst_pte, ptl);
 out_release:
 	put_page(page);
 	goto out;
@@ -176,6 +218,41 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
 	return ret;
 }
 
+/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
+static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
+				pmd_t *dst_pmd,
+				struct vm_area_struct *dst_vma,
+				unsigned long dst_addr,
+				bool wp_copy)
+{
+	struct inode *inode = file_inode(dst_vma->vm_file);
+	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
+	struct page *page;
+	int ret;
+
+	ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
+	if (ret)
+		goto out;
+	if (!page) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, false, wp_copy);
+	if (ret)
+		goto out_release;
+
+	unlock_page(page);
+	ret = 0;
+out:
+	return ret;
+out_release:
+	unlock_page(page);
+	put_page(page);
+	goto out;
+}
+
 static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
 {
 	pgd_t *pgd;
@@ -415,11 +492,16 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 						unsigned long dst_addr,
 						unsigned long src_addr,
 						struct page **page,
-						bool zeropage,
+						enum mcopy_atomic_mode mode,
 						bool wp_copy)
 {
 	ssize_t err;
 
+	if (mode == MCOPY_ATOMIC_CONTINUE) {
+		return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+					    wp_copy);
+	}
+
 	/*
 	 * The normal page fault path for a shmem will invoke the
 	 * fault, fill the hole in the file and COW it right away. The
@@ -431,7 +513,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 	 * and not in the radix tree.
 	 */
 	if (!(dst_vma->vm_flags & VM_SHARED)) {
-		if (!zeropage)
+		if (mode == MCOPY_ATOMIC_NORMAL)
 			err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
 					       dst_addr, src_addr, page,
 					       wp_copy);
@@ -441,7 +523,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 	} else {
 		VM_WARN_ON_ONCE(wp_copy);
 		err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
-					     dst_addr, src_addr, zeropage,
+					     dst_addr, src_addr,
+					     mode != MCOPY_ATOMIC_NORMAL,
 					     page);
 	}
 
@@ -463,7 +546,6 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	long copied;
 	struct page *page;
 	bool wp_copy;
-	bool zeropage = (mcopy_mode == MCOPY_ATOMIC_ZEROPAGE);
 
 	/*
 	 * Sanitize the command parameters:
@@ -526,7 +608,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 
 	if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
 		goto out_unlock;
-	if (mcopy_mode == MCOPY_ATOMIC_CONTINUE)
+	if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
 		goto out_unlock;
 
 	/*
@@ -574,7 +656,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 		BUG_ON(pmd_trans_huge(*dst_pmd));
 
 		err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
-				       src_addr, &page, zeropage, wp_copy);
+				       src_addr, &page, mcopy_mode, wp_copy);
 		cond_resched();
 
 		if (unlikely(err == -ENOENT)) {
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install a PTE for it.

This commit introduces a new helper: mcopy_atomic_install_pte.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c? The existing userfault implementation only relies on shmem.c
for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
shmem in one place, regardless of shared/private (to reduce code
duplication).

Why add a new mcopy_atomic_install_pte helper? A problem we have with
continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly. We do want to setup the PTEs
in a CONTINUE operation, but we don't want to e.g. allocate a new page,
charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
we have the problem stated above: shmem_mcopy_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared /
private) continue cares about. So, introduce mcontinue_atomic_pte(), to
handle all of the shmem continue cases. Introduce the helper so it
doesn't duplicate code with mcopy_atomic_pte().

In a future commit, shmem_mcopy_atomic_pte() will also be modified to
use this new helper. However, since this is a bigger refactor, it seems
most clear to do it as a separate change.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 mm/userfaultfd.c | 172 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 127 insertions(+), 45 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 23fa2583bbd1..51d8c0127161 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -48,6 +48,83 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
 	return dst_vma;
 }
 
+/*
+ * Install PTEs, to map dst_addr (within dst_vma) to page.
+ *
+ * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
+ * whether or not dst_vma is VM_SHARED. It also handles the more general
+ * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
+ * backed, or not).
+ *
+ * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
+ * shmem_mcopy_atomic_pte instead.
+ */
+static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+				    struct vm_area_struct *dst_vma,
+				    unsigned long dst_addr, struct page *page,
+				    bool newly_allocated, bool wp_copy)
+{
+	int ret;
+	pte_t _dst_pte, *dst_pte;
+	bool writable = dst_vma->vm_flags & VM_WRITE;
+	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
+	bool page_in_cache = page->mapping;
+	spinlock_t *ptl;
+	struct inode *inode;
+	pgoff_t offset, max_off;
+
+	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+	if (page_in_cache && !vm_shared)
+		writable = false;
+	if (writable || !page_in_cache)
+		_dst_pte = pte_mkdirty(_dst_pte);
+	if (writable) {
+		if (wp_copy)
+			_dst_pte = pte_mkuffd_wp(_dst_pte);
+		else
+			_dst_pte = pte_mkwrite(_dst_pte);
+	}
+
+	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+
+	if (vma_is_shmem(dst_vma)) {
+		/* serialize against truncate with the page table lock */
+		inode = dst_vma->vm_file->f_inode;
+		offset = linear_page_index(dst_vma, dst_addr);
+		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+		ret = -EFAULT;
+		if (unlikely(offset >= max_off))
+			goto out_unlock;
+	}
+
+	ret = -EEXIST;
+	if (!pte_none(*dst_pte))
+		goto out_unlock;
+
+	if (page_in_cache)
+		page_add_file_rmap(page, false);
+	else
+		page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
+
+	/*
+	 * Must happen after rmap, as mm_counter() checks mapping (via
+	 * PageAnon()), which is set by __page_set_anon_rmap().
+	 */
+	inc_mm_counter(dst_mm, mm_counter(page));
+
+	if (newly_allocated)
+		lru_cache_add_inactive_or_unevictable(page, dst_vma);
+
+	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+	/* No need to invalidate - it was non-present before */
+	update_mmu_cache(dst_vma, dst_addr, dst_pte);
+	ret = 0;
+out_unlock:
+	pte_unmap_unlock(dst_pte, ptl);
+	return ret;
+}
+
 static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    pmd_t *dst_pmd,
 			    struct vm_area_struct *dst_vma,
@@ -56,13 +133,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 			    struct page **pagep,
 			    bool wp_copy)
 {
-	pte_t _dst_pte, *dst_pte;
-	spinlock_t *ptl;
 	void *page_kaddr;
 	int ret;
 	struct page *page;
-	pgoff_t offset, max_off;
-	struct inode *inode;
 
 	if (!*pagep) {
 		ret = -ENOMEM;
@@ -99,43 +172,12 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL))
 		goto out_release;
 
-	_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
-	if (dst_vma->vm_flags & VM_WRITE) {
-		if (wp_copy)
-			_dst_pte = pte_mkuffd_wp(_dst_pte);
-		else
-			_dst_pte = pte_mkwrite(_dst_pte);
-	}
-
-	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
-	if (dst_vma->vm_file) {
-		/* the shmem MAP_PRIVATE case requires checking the i_size */
-		inode = dst_vma->vm_file->f_inode;
-		offset = linear_page_index(dst_vma, dst_addr);
-		max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-		ret = -EFAULT;
-		if (unlikely(offset >= max_off))
-			goto out_release_uncharge_unlock;
-	}
-	ret = -EEXIST;
-	if (!pte_none(*dst_pte))
-		goto out_release_uncharge_unlock;
-
-	inc_mm_counter(dst_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
-	lru_cache_add_inactive_or_unevictable(page, dst_vma);
-
-	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(dst_vma, dst_addr, dst_pte);
-
-	pte_unmap_unlock(dst_pte, ptl);
-	ret = 0;
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, true, wp_copy);
+	if (ret)
+		goto out_release;
 out:
 	return ret;
-out_release_uncharge_unlock:
-	pte_unmap_unlock(dst_pte, ptl);
 out_release:
 	put_page(page);
 	goto out;
@@ -176,6 +218,41 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm,
 	return ret;
 }
 
+/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
+static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
+				pmd_t *dst_pmd,
+				struct vm_area_struct *dst_vma,
+				unsigned long dst_addr,
+				bool wp_copy)
+{
+	struct inode *inode = file_inode(dst_vma->vm_file);
+	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
+	struct page *page;
+	int ret;
+
+	ret = shmem_getpage(inode, pgoff, &page, SGP_READ);
+	if (ret)
+		goto out;
+	if (!page) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, false, wp_copy);
+	if (ret)
+		goto out_release;
+
+	unlock_page(page);
+	ret = 0;
+out:
+	return ret;
+out_release:
+	unlock_page(page);
+	put_page(page);
+	goto out;
+}
+
 static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
 {
 	pgd_t *pgd;
@@ -415,11 +492,16 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 						unsigned long dst_addr,
 						unsigned long src_addr,
 						struct page **page,
-						bool zeropage,
+						enum mcopy_atomic_mode mode,
 						bool wp_copy)
 {
 	ssize_t err;
 
+	if (mode == MCOPY_ATOMIC_CONTINUE) {
+		return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+					    wp_copy);
+	}
+
 	/*
 	 * The normal page fault path for a shmem will invoke the
 	 * fault, fill the hole in the file and COW it right away. The
@@ -431,7 +513,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 	 * and not in the radix tree.
 	 */
 	if (!(dst_vma->vm_flags & VM_SHARED)) {
-		if (!zeropage)
+		if (mode == MCOPY_ATOMIC_NORMAL)
 			err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
 					       dst_addr, src_addr, page,
 					       wp_copy);
@@ -441,7 +523,8 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
 	} else {
 		VM_WARN_ON_ONCE(wp_copy);
 		err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
-					     dst_addr, src_addr, zeropage,
+					     dst_addr, src_addr,
+					     mode != MCOPY_ATOMIC_NORMAL,
 					     page);
 	}
 
@@ -463,7 +546,6 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 	long copied;
 	struct page *page;
 	bool wp_copy;
-	bool zeropage = (mcopy_mode == MCOPY_ATOMIC_ZEROPAGE);
 
 	/*
 	 * Sanitize the command parameters:
@@ -526,7 +608,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 
 	if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
 		goto out_unlock;
-	if (mcopy_mode == MCOPY_ATOMIC_CONTINUE)
+	if (!vma_is_shmem(dst_vma) && mcopy_mode == MCOPY_ATOMIC_CONTINUE)
 		goto out_unlock;
 
 	/*
@@ -574,7 +656,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
 		BUG_ON(pmd_trans_huge(*dst_pmd));
 
 		err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
-				       src_addr, &page, zeropage, wp_copy);
+				       src_addr, &page, mcopy_mode, wp_copy);
 		cond_resched();
 
 		if (unlikely(err == -ENOENT)) {
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Now that the feature is fully implemented (the faulting path hooks exist
so userspace is notified, and the ioctl to resolve such faults is
available), advertise this as a supported feature.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 Documentation/admin-guide/mm/userfaultfd.rst | 3 ++-
 fs/userfaultfd.c                             | 3 ++-
 include/uapi/linux/userfaultfd.h             | 7 ++++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 3aa38e8b8361..6528036093e1 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -77,7 +77,8 @@ events, except page fault notifications, may be generated:
 
 - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
   ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
-  areas.
+  areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating
+  support for shmem virtual memory areas.
 
 The userland application should set the feature flags it intends to use
 when invoking the ``UFFDIO_API`` ioctl, to request that those features be
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 468556fb04a9..9f3b8684cf3c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1940,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 	/* report all available features and ioctls to userland */
 	uffdio_api.features = UFFD_API_FEATURES;
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-	uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
+	uffdio_api.features &=
+		~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
 #endif
 	uffdio_api.ioctls = UFFD_API_IOCTLS;
 	ret = -EFAULT;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index bafbeb1a2624..159a74e9564f 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -31,7 +31,8 @@
 			   UFFD_FEATURE_MISSING_SHMEM |		\
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
-			   UFFD_FEATURE_MINOR_HUGETLBFS)
+			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
+			   UFFD_FEATURE_MINOR_SHMEM)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -185,6 +186,9 @@ struct uffdio_api {
 	 * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
 	 * can be intercepted (via REGISTER_MODE_MINOR) for
 	 * hugetlbfs-backed pages.
+	 *
+	 * UFFD_FEATURE_MINOR_SHMEM indicates the same support as
+	 * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -196,6 +200,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_SIGBUS			(1<<7)
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
+#define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
 	__u64 features;
 
 	__u64 ioctls;
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Now that the feature is fully implemented (the faulting path hooks exist
so userspace is notified, and the ioctl to resolve such faults is
available), advertise this as a supported feature.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 Documentation/admin-guide/mm/userfaultfd.rst | 3 ++-
 fs/userfaultfd.c                             | 3 ++-
 include/uapi/linux/userfaultfd.h             | 7 ++++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 3aa38e8b8361..6528036093e1 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -77,7 +77,8 @@ events, except page fault notifications, may be generated:
 
 - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
   ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
-  areas.
+  areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating
+  support for shmem virtual memory areas.
 
 The userland application should set the feature flags it intends to use
 when invoking the ``UFFDIO_API`` ioctl, to request that those features be
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 468556fb04a9..9f3b8684cf3c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1940,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 	/* report all available features and ioctls to userland */
 	uffdio_api.features = UFFD_API_FEATURES;
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-	uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
+	uffdio_api.features &=
+		~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
 #endif
 	uffdio_api.ioctls = UFFD_API_IOCTLS;
 	ret = -EFAULT;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index bafbeb1a2624..159a74e9564f 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -31,7 +31,8 @@
 			   UFFD_FEATURE_MISSING_SHMEM |		\
 			   UFFD_FEATURE_SIGBUS |		\
 			   UFFD_FEATURE_THREAD_ID |		\
-			   UFFD_FEATURE_MINOR_HUGETLBFS)
+			   UFFD_FEATURE_MINOR_HUGETLBFS |	\
+			   UFFD_FEATURE_MINOR_SHMEM)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -185,6 +186,9 @@ struct uffdio_api {
 	 * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
 	 * can be intercepted (via REGISTER_MODE_MINOR) for
 	 * hugetlbfs-backed pages.
+	 *
+	 * UFFD_FEATURE_MINOR_SHMEM indicates the same support as
+	 * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -196,6 +200,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_SIGBUS			(1<<7)
 #define UFFD_FEATURE_THREAD_ID			(1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS		(1<<9)
+#define UFFD_FEATURE_MINOR_SHMEM		(1<<10)
 	__u64 features;
 
 	__u64 ioctls;
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

In a previous commit, we added the mcopy_atomic_install_pte() helper.
This helper does the job of setting up PTEs for an existing page, to map
it into a given VMA. It deals with both the anon and shmem cases, as
well as the shared and private cases.

In other words, shmem_mcopy_atomic_pte() duplicates a case it already
handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
directly, to reduce code duplication.

This requires that we refactor shmem_mcopy_atomic_pte() a bit:

Instead of doing accounting (shmem_recalc_inode() et al) part-way
through the PTE setup, do it afterward. This frees up
mcopy_atomic_install_pte() from having to care about this accounting,
and means we don't need to e.g. shmem_uncharge() in the error path.

A side effect is this switches shmem_mcopy_atomic_pte() to use
lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
This wrapper does some extra accounting in an exceptional case, if
appropriate, so it's actually the more correct thing to use.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/userfaultfd_k.h |  5 ++++
 mm/shmem.c                    | 48 +++++------------------------------
 mm/userfaultfd.c              | 17 +++++--------
 3 files changed, 18 insertions(+), 52 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 794d1538b8ba..39c094cc6641 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -53,6 +53,11 @@ enum mcopy_atomic_mode {
 	MCOPY_ATOMIC_CONTINUE,
 };
 
+extern int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+				    struct vm_area_struct *dst_vma,
+				    unsigned long dst_addr, struct page *page,
+				    bool newly_allocated, bool wp_copy);
+
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
 			    unsigned long src_start, unsigned long len,
 			    bool *mmap_changing, __u64 mode);
diff --git a/mm/shmem.c b/mm/shmem.c
index 30c0bb501dc9..37db52f45cb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2378,10 +2378,8 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	struct address_space *mapping = inode->i_mapping;
 	gfp_t gfp = mapping_gfp_mask(mapping);
 	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
-	spinlock_t *ptl;
 	void *page_kaddr;
 	struct page *page;
-	pte_t _dst_pte, *dst_pte;
 	int ret;
 	pgoff_t max_off;
 
@@ -2404,9 +2402,9 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			/* fallback to copy_from_user outside mmap_lock */
 			if (unlikely(ret)) {
 				*pagep = page;
-				shmem_inode_unacct_blocks(inode, 1);
+				ret = -ENOENT;
 				/* don't free the page */
-				return -ENOENT;
+				goto out_unacct_blocks;
 			}
 		} else {		/* ZEROPAGE */
 			clear_highpage(page);
@@ -2432,32 +2430,10 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (ret)
 		goto out_release;
 
-	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-	if (dst_vma->vm_flags & VM_WRITE)
-		_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
-	else {
-		/*
-		 * We don't set the pte dirty if the vma has no
-		 * VM_WRITE permission, so mark the page dirty or it
-		 * could be freed from under us. We could do it
-		 * unconditionally before unlock_page(), but doing it
-		 * only if VM_WRITE is not set is faster.
-		 */
-		set_page_dirty(page);
-	}
-
-	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
-
-	ret = -EFAULT;
-	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(pgoff >= max_off))
-		goto out_release_unlock;
-
-	ret = -EEXIST;
-	if (!pte_none(*dst_pte))
-		goto out_release_unlock;
-
-	lru_cache_add(page);
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, true, false);
+	if (ret)
+		goto out_release;
 
 	spin_lock_irq(&info->lock);
 	info->alloced++;
@@ -2465,21 +2441,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-	inc_mm_counter(dst_mm, mm_counter_file(page));
-	page_add_file_rmap(page, false);
-	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(dst_vma, dst_addr, dst_pte);
-	pte_unmap_unlock(dst_pte, ptl);
+	SetPageDirty(page);
 	unlock_page(page);
 	ret = 0;
 out:
 	return ret;
-out_release_unlock:
-	pte_unmap_unlock(dst_pte, ptl);
-	ClearPageDirty(page);
-	delete_from_page_cache(page);
 out_release:
 	unlock_page(page);
 	put_page(page);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 51d8c0127161..3a9ddbb2dbbd 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -51,18 +51,13 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
 /*
  * Install PTEs, to map dst_addr (within dst_vma) to page.
  *
- * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
- * whether or not dst_vma is VM_SHARED. It also handles the more general
- * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
- * backed, or not).
- *
- * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
- * shmem_mcopy_atomic_pte instead.
+ * This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
+ * and anon, and for both shared and private VMAs.
  */
-static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
-				    struct vm_area_struct *dst_vma,
-				    unsigned long dst_addr, struct page *page,
-				    bool newly_allocated, bool wp_copy)
+int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+			     struct vm_area_struct *dst_vma,
+			     unsigned long dst_addr, struct page *page,
+			     bool newly_allocated, bool wp_copy)
 {
 	int ret;
 	pte_t _dst_pte, *dst_pte;
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

In a previous commit, we added the mcopy_atomic_install_pte() helper.
This helper does the job of setting up PTEs for an existing page, to map
it into a given VMA. It deals with both the anon and shmem cases, as
well as the shared and private cases.

In other words, shmem_mcopy_atomic_pte() duplicates a case it already
handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
directly, to reduce code duplication.

This requires that we refactor shmem_mcopy_atomic_pte() a bit:

Instead of doing accounting (shmem_recalc_inode() et al) part-way
through the PTE setup, do it afterward. This frees up
mcopy_atomic_install_pte() from having to care about this accounting,
and means we don't need to e.g. shmem_uncharge() in the error path.

A side effect is this switches shmem_mcopy_atomic_pte() to use
lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
This wrapper does some extra accounting in an exceptional case, if
appropriate, so it's actually the more correct thing to use.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 include/linux/userfaultfd_k.h |  5 ++++
 mm/shmem.c                    | 48 +++++------------------------------
 mm/userfaultfd.c              | 17 +++++--------
 3 files changed, 18 insertions(+), 52 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 794d1538b8ba..39c094cc6641 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -53,6 +53,11 @@ enum mcopy_atomic_mode {
 	MCOPY_ATOMIC_CONTINUE,
 };
 
+extern int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+				    struct vm_area_struct *dst_vma,
+				    unsigned long dst_addr, struct page *page,
+				    bool newly_allocated, bool wp_copy);
+
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
 			    unsigned long src_start, unsigned long len,
 			    bool *mmap_changing, __u64 mode);
diff --git a/mm/shmem.c b/mm/shmem.c
index 30c0bb501dc9..37db52f45cb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2378,10 +2378,8 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	struct address_space *mapping = inode->i_mapping;
 	gfp_t gfp = mapping_gfp_mask(mapping);
 	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
-	spinlock_t *ptl;
 	void *page_kaddr;
 	struct page *page;
-	pte_t _dst_pte, *dst_pte;
 	int ret;
 	pgoff_t max_off;
 
@@ -2404,9 +2402,9 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 			/* fallback to copy_from_user outside mmap_lock */
 			if (unlikely(ret)) {
 				*pagep = page;
-				shmem_inode_unacct_blocks(inode, 1);
+				ret = -ENOENT;
 				/* don't free the page */
-				return -ENOENT;
+				goto out_unacct_blocks;
 			}
 		} else {		/* ZEROPAGE */
 			clear_highpage(page);
@@ -2432,32 +2430,10 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	if (ret)
 		goto out_release;
 
-	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-	if (dst_vma->vm_flags & VM_WRITE)
-		_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
-	else {
-		/*
-		 * We don't set the pte dirty if the vma has no
-		 * VM_WRITE permission, so mark the page dirty or it
-		 * could be freed from under us. We could do it
-		 * unconditionally before unlock_page(), but doing it
-		 * only if VM_WRITE is not set is faster.
-		 */
-		set_page_dirty(page);
-	}
-
-	dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
-
-	ret = -EFAULT;
-	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-	if (unlikely(pgoff >= max_off))
-		goto out_release_unlock;
-
-	ret = -EEXIST;
-	if (!pte_none(*dst_pte))
-		goto out_release_unlock;
-
-	lru_cache_add(page);
+	ret = mcopy_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
+				       page, true, false);
+	if (ret)
+		goto out_release;
 
 	spin_lock_irq(&info->lock);
 	info->alloced++;
@@ -2465,21 +2441,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-	inc_mm_counter(dst_mm, mm_counter_file(page));
-	page_add_file_rmap(page, false);
-	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache(dst_vma, dst_addr, dst_pte);
-	pte_unmap_unlock(dst_pte, ptl);
+	SetPageDirty(page);
 	unlock_page(page);
 	ret = 0;
 out:
 	return ret;
-out_release_unlock:
-	pte_unmap_unlock(dst_pte, ptl);
-	ClearPageDirty(page);
-	delete_from_page_cache(page);
 out_release:
 	unlock_page(page);
 	put_page(page);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 51d8c0127161..3a9ddbb2dbbd 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -51,18 +51,13 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
 /*
  * Install PTEs, to map dst_addr (within dst_vma) to page.
  *
- * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
- * whether or not dst_vma is VM_SHARED. It also handles the more general
- * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
- * backed, or not).
- *
- * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
- * shmem_mcopy_atomic_pte instead.
+ * This function handles both MCOPY_ATOMIC_NORMAL and _CONTINUE for both shmem
+ * and anon, and for both shared and private VMAs.
  */
-static int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
-				    struct vm_area_struct *dst_vma,
-				    unsigned long dst_addr, struct page *page,
-				    bool newly_allocated, bool wp_copy)
+int mcopy_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+			     struct vm_area_struct *dst_vma,
+			     unsigned long dst_addr, struct page *page,
+			     bool newly_allocated, bool wp_copy)
 {
 	int ret;
 	pte_t _dst_pte, *dst_pte;
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 07/10] userfaultfd/selftests: use memfd_create for shmem test type
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do
in the hugetlb_shared test. With a VMA obtained via
mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since
it's more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup
fails in main(), pass in the right argv[] so we actually print out the
hugetlb file path.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 6339aeaeeff8..fc40831f818f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -85,6 +85,7 @@ static bool test_uffdio_wp = false;
 static bool test_uffdio_minor = false;
 
 static bool map_shared;
+static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
@@ -277,8 +278,11 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
+	unsigned long offset =
+		alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-			   MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			   MAP_SHARED, shm_fd, offset);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of memfd failed");
 }
@@ -1448,6 +1452,16 @@ int main(int argc, char **argv)
 			err("Open of %s failed", argv[4]);
 		if (ftruncate(huge_fd, 0))
 			err("ftruncate %s to size 0 failed", argv[4]);
+	} else if (test_type == TEST_SHMEM) {
+		shm_fd = memfd_create(argv[0], 0);
+		if (shm_fd < 0)
+			err("memfd_create");
+		if (ftruncate(shm_fd, nr_pages * page_size * 2))
+			err("ftruncate");
+		if (fallocate(shm_fd,
+			      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+			      nr_pages * page_size * 2))
+			err("fallocate");
 	}
 	printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
 	       nr_pages, nr_pages_per_cpu);
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 07/10] userfaultfd/selftests: use memfd_create for shmem test type
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do
in the hugetlb_shared test. With a VMA obtained via
mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since
it's more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup
fails in main(), pass in the right argv[] so we actually print out the
hugetlb file path.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 6339aeaeeff8..fc40831f818f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -85,6 +85,7 @@ static bool test_uffdio_wp = false;
 static bool test_uffdio_minor = false;
 
 static bool map_shared;
+static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
@@ -277,8 +278,11 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
+	unsigned long offset =
+		alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-			   MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+			   MAP_SHARED, shm_fd, offset);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of memfd failed");
 }
@@ -1448,6 +1452,16 @@ int main(int argc, char **argv)
 			err("Open of %s failed", argv[4]);
 		if (ftruncate(huge_fd, 0))
 			err("ftruncate %s to size 0 failed", argv[4]);
+	} else if (test_type == TEST_SHMEM) {
+		shm_fd = memfd_create(argv[0], 0);
+		if (shm_fd < 0)
+			err("memfd_create");
+		if (ftruncate(shm_fd, nr_pages * page_size * 2))
+			err("ftruncate");
+		if (fallocate(shm_fd,
+			      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+			      nr_pages * page_size * 2))
+			err("fallocate");
 	}
 	printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
 	       nr_pages, nr_pages_per_cpu);
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 08/10] userfaultfd/selftests: create alias mappings in the shmem test
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling
support for shmem, just like we do in the hugetlb_shared test.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index fc40831f818f..1f65c4ab7994 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -278,13 +278,29 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
-	unsigned long offset =
-		alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+	void *area_alias = NULL;
+	bool is_src = alloc_area == (void **)&area_src;
+	unsigned long offset = is_src ? 0 : nr_pages * page_size;
 
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
 			   MAP_SHARED, shm_fd, offset);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of memfd failed");
+
+	area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+			  MAP_SHARED, shm_fd, offset);
+	if (area_alias == MAP_FAILED)
+		err("mmap of memfd alias failed");
+
+	if (is_src)
+		area_src_alias = area_alias;
+	else
+		area_dst_alias = area_alias;
+}
+
+static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+	*start = (unsigned long)area_dst_alias + offset;
 }
 
 struct uffd_test_ops {
@@ -314,7 +330,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
 	.expected_ioctls = SHMEM_EXPECTED_IOCTLS,
 	.allocate_area	= shmem_allocate_area,
 	.release_pages	= shmem_release_pages,
-	.alias_mapping = noop_alias_mapping,
+	.alias_mapping = shmem_alias_mapping,
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 08/10] userfaultfd/selftests: create alias mappings in the shmem test
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling
support for shmem, just like we do in the hugetlb_shared test.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index fc40831f818f..1f65c4ab7994 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -278,13 +278,29 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
-	unsigned long offset =
-		alloc_area == (void **)&area_src ? 0 : nr_pages * page_size;
+	void *area_alias = NULL;
+	bool is_src = alloc_area == (void **)&area_src;
+	unsigned long offset = is_src ? 0 : nr_pages * page_size;
 
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
 			   MAP_SHARED, shm_fd, offset);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of memfd failed");
+
+	area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+			  MAP_SHARED, shm_fd, offset);
+	if (area_alias == MAP_FAILED)
+		err("mmap of memfd alias failed");
+
+	if (is_src)
+		area_src_alias = area_alias;
+	else
+		area_dst_alias = area_alias;
+}
+
+static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+	*start = (unsigned long)area_dst_alias + offset;
 }
 
 struct uffd_test_ops {
@@ -314,7 +330,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
 	.expected_ioctls = SHMEM_EXPECTED_IOCTLS,
 	.allocate_area	= shmem_allocate_area,
 	.release_pages	= shmem_release_pages,
-	.alias_mapping = noop_alias_mapping,
+	.alias_mapping = shmem_alias_mapping,
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas). We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile. It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of
each test case, so whatever prior test cases did doesn't affect future
tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was
relying on. This wasn't a problem for hugetlb, as we don't mremap in
that case.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 215 ++++++++++++-----------
 1 file changed, 116 insertions(+), 99 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 1f65c4ab7994..3fbc69f513dc 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -89,7 +89,8 @@ static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
-static int uffd, uffd_flags, finished, *pipefd;
+static int uffd = -1;
+static int uffd_flags, finished, *pipefd;
 static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
@@ -342,6 +343,111 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = {
 
 static struct uffd_test_ops *uffd_test_ops;
 
+static void userfaultfd_open(uint64_t *features)
+{
+	struct uffdio_api uffdio_api;
+
+	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
+	if (uffd < 0)
+		err("userfaultfd syscall not available in this kernel");
+	uffd_flags = fcntl(uffd, F_GETFD, NULL);
+
+	uffdio_api.api = UFFD_API;
+	uffdio_api.features = *features;
+	if (ioctl(uffd, UFFDIO_API, &uffdio_api))
+		err("UFFDIO_API failed.\nPlease make sure to "
+		    "run with either root or ptrace capability.");
+	if (uffdio_api.api != UFFD_API)
+		err("UFFDIO_API error: %" PRIu64, (uint64_t)uffdio_api.api);
+
+	*features = uffdio_api.features;
+}
+
+static inline void munmap_area(void **area)
+{
+	if (*area)
+		if (munmap(*area, nr_pages * page_size))
+			err("munmap");
+
+	*area = NULL;
+}
+
+static void uffd_test_ctx_clear(void)
+{
+	size_t i;
+
+	if (pipefd) {
+		for (i = 0; i < nr_cpus * 2; ++i) {
+			if (close(pipefd[i]))
+				err("close pipefd");
+		}
+		free(pipefd);
+		pipefd = NULL;
+	}
+
+	if (count_verify) {
+		free(count_verify);
+		count_verify = NULL;
+	}
+
+	if (uffd != -1) {
+		if (close(uffd))
+			err("close uffd");
+		uffd = -1;
+	}
+
+	huge_fd_off0 = NULL;
+	munmap_area((void **)&area_src);
+	munmap_area((void **)&area_src_alias);
+	munmap_area((void **)&area_dst);
+	munmap_area((void **)&area_dst_alias);
+}
+
+static void uffd_test_ctx_init_ext(uint64_t *features)
+{
+	unsigned long nr, cpu;
+
+	uffd_test_ctx_clear();
+
+	uffd_test_ops->allocate_area((void **)&area_src);
+	uffd_test_ops->allocate_area((void **)&area_dst);
+
+	uffd_test_ops->release_pages(area_src);
+	uffd_test_ops->release_pages(area_dst);
+
+	userfaultfd_open(features);
+
+	count_verify = malloc(nr_pages * sizeof(unsigned long long));
+	if (!count_verify)
+		err("count_verify");
+
+	for (nr = 0; nr < nr_pages; nr++) {
+		*area_mutex(area_src, nr) =
+			(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;
+		count_verify[nr] = *area_count(area_src, nr) = 1;
+		/*
+		 * In the transition between 255 to 256, powerpc will
+		 * read out of order in my_bcmp and see both bytes as
+		 * zero, so leave a placeholder below always non-zero
+		 * after the count, to avoid my_bcmp to trigger false
+		 * positives.
+		 */
+		*(area_count(area_src, nr) + 1) = 1;
+	}
+
+	pipefd = malloc(sizeof(int) * nr_cpus * 2);
+	if (!pipefd)
+		err("pipefd");
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		if (pipe2(&pipefd[cpu * 2], O_CLOEXEC | O_NONBLOCK))
+			err("pipe");
+}
+
+static inline void uffd_test_ctx_init(uint64_t features)
+{
+	uffd_test_ctx_init_ext(&features);
+}
+
 static int my_bcmp(char *str1, char *str2, size_t n)
 {
 	unsigned long i;
@@ -726,40 +832,6 @@ static int stress(struct uffd_stats *uffd_stats)
 	return 0;
 }
 
-static int userfaultfd_open_ext(uint64_t *features)
-{
-	struct uffdio_api uffdio_api;
-
-	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
-	if (uffd < 0) {
-		fprintf(stderr,
-			"userfaultfd syscall not available in this kernel\n");
-		return 1;
-	}
-	uffd_flags = fcntl(uffd, F_GETFD, NULL);
-
-	uffdio_api.api = UFFD_API;
-	uffdio_api.features = *features;
-	if (ioctl(uffd, UFFDIO_API, &uffdio_api)) {
-		fprintf(stderr, "UFFDIO_API failed.\nPlease make sure to "
-			"run with either root or ptrace capability.\n");
-		return 1;
-	}
-	if (uffdio_api.api != UFFD_API) {
-		fprintf(stderr, "UFFDIO_API error: %" PRIu64 "\n",
-			(uint64_t)uffdio_api.api);
-		return 1;
-	}
-
-	*features = uffdio_api.features;
-	return 0;
-}
-
-static int userfaultfd_open(uint64_t features)
-{
-	return userfaultfd_open_ext(&features);
-}
-
 sigjmp_buf jbuf, *sigbuf;
 
 static void sighndl(int sig, siginfo_t *siginfo, void *ptr)
@@ -868,6 +940,8 @@ static int faulting_process(int signal_test)
 			  MREMAP_MAYMOVE | MREMAP_FIXED, area_src);
 	if (area_dst == MAP_FAILED)
 		err("mremap");
+	/* Reset area_src since we just clobbered it */
+	area_src = NULL;
 
 	for (; nr < nr_pages; nr++) {
 		count = *area_count(area_dst, nr);
@@ -961,10 +1035,8 @@ static int userfaultfd_zeropage_test(void)
 	printf("testing UFFDIO_ZEROPAGE: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
+	uffd_test_ctx_init(0);
 
-	if (userfaultfd_open(0))
-		return 1;
 	uffdio_register.range.start = (unsigned long) area_dst;
 	uffdio_register.range.len = nr_pages * page_size;
 	uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
@@ -981,7 +1053,6 @@ static int userfaultfd_zeropage_test(void)
 		if (my_bcmp(area_dst, zeropage, page_size))
 			err("zeropage is not zero");
 
-	close(uffd);
 	printf("done.\n");
 	return 0;
 }
@@ -999,12 +1070,10 @@ static int userfaultfd_events_test(void)
 	printf("testing events (fork, remap, remove): ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
 	features = UFFD_FEATURE_EVENT_FORK | UFFD_FEATURE_EVENT_REMAP |
 		UFFD_FEATURE_EVENT_REMOVE;
-	if (userfaultfd_open(features))
-		return 1;
+	uffd_test_ctx_init(features);
+
 	fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
 
 	uffdio_register.range.start = (unsigned long) area_dst;
@@ -1037,8 +1106,6 @@ static int userfaultfd_events_test(void)
 	if (pthread_join(uffd_mon, NULL))
 		return 1;
 
-	close(uffd);
-
 	uffd_stats_report(&stats, 1);
 
 	return stats.missing_faults != nr_pages;
@@ -1058,11 +1125,9 @@ static int userfaultfd_sig_test(void)
 	printf("testing signal delivery: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
 	features = UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_SIGBUS;
-	if (userfaultfd_open(features))
-		return 1;
+	uffd_test_ctx_init(features);
+
 	fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
 
 	uffdio_register.range.start = (unsigned long) area_dst;
@@ -1103,7 +1168,6 @@ static int userfaultfd_sig_test(void)
 	printf("done.\n");
 	if (userfaults)
 		err("Signal test failed, userfaults: %ld", userfaults);
-	close(uffd);
 
 	return userfaults != 0;
 }
@@ -1126,10 +1190,7 @@ static int userfaultfd_minor_test(void)
 	printf("testing minor faults: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
-	if (userfaultfd_open_ext(&features))
-		return 1;
+	uffd_test_ctx_init_ext(&features);
 	/* If kernel reports the feature isn't supported, skip the test. */
 	if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
 		printf("skipping test due to lack of feature support\n");
@@ -1183,8 +1244,6 @@ static int userfaultfd_minor_test(void)
 	if (pthread_join(uffd_mon, NULL))
 		return 1;
 
-	close(uffd);
-
 	uffd_stats_report(&stats, 1);
 
 	return stats.missing_faults != 0 || stats.minor_faults != nr_pages;
@@ -1196,50 +1255,9 @@ static int userfaultfd_stress(void)
 	char *tmp_area;
 	unsigned long nr;
 	struct uffdio_register uffdio_register;
-	unsigned long cpu;
 	struct uffd_stats uffd_stats[nr_cpus];
 
-	uffd_test_ops->allocate_area((void **)&area_src);
-	if (!area_src)
-		return 1;
-	uffd_test_ops->allocate_area((void **)&area_dst);
-	if (!area_dst)
-		return 1;
-
-	if (userfaultfd_open(0))
-		return 1;
-
-	count_verify = malloc(nr_pages * sizeof(unsigned long long));
-	if (!count_verify) {
-		perror("count_verify");
-		return 1;
-	}
-
-	for (nr = 0; nr < nr_pages; nr++) {
-		*area_mutex(area_src, nr) = (pthread_mutex_t)
-			PTHREAD_MUTEX_INITIALIZER;
-		count_verify[nr] = *area_count(area_src, nr) = 1;
-		/*
-		 * In the transition between 255 to 256, powerpc will
-		 * read out of order in my_bcmp and see both bytes as
-		 * zero, so leave a placeholder below always non-zero
-		 * after the count, to avoid my_bcmp to trigger false
-		 * positives.
-		 */
-		*(area_count(area_src, nr) + 1) = 1;
-	}
-
-	pipefd = malloc(sizeof(int) * nr_cpus * 2);
-	if (!pipefd) {
-		perror("pipefd");
-		return 1;
-	}
-	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		if (pipe2(&pipefd[cpu*2], O_CLOEXEC | O_NONBLOCK)) {
-			perror("pipe");
-			return 1;
-		}
-	}
+	uffd_test_ctx_init(0);
 
 	if (posix_memalign(&area, page_size, page_size))
 		err("out of memory");
@@ -1360,7 +1378,6 @@ static int userfaultfd_stress(void)
 		uffd_stats_report(uffd_stats, nr_cpus);
 	}
 
-	close(uffd);
 	return userfaultfd_zeropage_test() || userfaultfd_sig_test()
 		|| userfaultfd_events_test() || userfaultfd_minor_test();
 }
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas). We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile. It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of
each test case, so whatever prior test cases did doesn't affect future
tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was
relying on. This wasn't a problem for hugetlb, as we don't mremap in
that case.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 215 ++++++++++++-----------
 1 file changed, 116 insertions(+), 99 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 1f65c4ab7994..3fbc69f513dc 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -89,7 +89,8 @@ static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
-static int uffd, uffd_flags, finished, *pipefd;
+static int uffd = -1;
+static int uffd_flags, finished, *pipefd;
 static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
@@ -342,6 +343,111 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = {
 
 static struct uffd_test_ops *uffd_test_ops;
 
+static void userfaultfd_open(uint64_t *features)
+{
+	struct uffdio_api uffdio_api;
+
+	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
+	if (uffd < 0)
+		err("userfaultfd syscall not available in this kernel");
+	uffd_flags = fcntl(uffd, F_GETFD, NULL);
+
+	uffdio_api.api = UFFD_API;
+	uffdio_api.features = *features;
+	if (ioctl(uffd, UFFDIO_API, &uffdio_api))
+		err("UFFDIO_API failed.\nPlease make sure to "
+		    "run with either root or ptrace capability.");
+	if (uffdio_api.api != UFFD_API)
+		err("UFFDIO_API error: %" PRIu64, (uint64_t)uffdio_api.api);
+
+	*features = uffdio_api.features;
+}
+
+static inline void munmap_area(void **area)
+{
+	if (*area)
+		if (munmap(*area, nr_pages * page_size))
+			err("munmap");
+
+	*area = NULL;
+}
+
+static void uffd_test_ctx_clear(void)
+{
+	size_t i;
+
+	if (pipefd) {
+		for (i = 0; i < nr_cpus * 2; ++i) {
+			if (close(pipefd[i]))
+				err("close pipefd");
+		}
+		free(pipefd);
+		pipefd = NULL;
+	}
+
+	if (count_verify) {
+		free(count_verify);
+		count_verify = NULL;
+	}
+
+	if (uffd != -1) {
+		if (close(uffd))
+			err("close uffd");
+		uffd = -1;
+	}
+
+	huge_fd_off0 = NULL;
+	munmap_area((void **)&area_src);
+	munmap_area((void **)&area_src_alias);
+	munmap_area((void **)&area_dst);
+	munmap_area((void **)&area_dst_alias);
+}
+
+static void uffd_test_ctx_init_ext(uint64_t *features)
+{
+	unsigned long nr, cpu;
+
+	uffd_test_ctx_clear();
+
+	uffd_test_ops->allocate_area((void **)&area_src);
+	uffd_test_ops->allocate_area((void **)&area_dst);
+
+	uffd_test_ops->release_pages(area_src);
+	uffd_test_ops->release_pages(area_dst);
+
+	userfaultfd_open(features);
+
+	count_verify = malloc(nr_pages * sizeof(unsigned long long));
+	if (!count_verify)
+		err("count_verify");
+
+	for (nr = 0; nr < nr_pages; nr++) {
+		*area_mutex(area_src, nr) =
+			(pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;
+		count_verify[nr] = *area_count(area_src, nr) = 1;
+		/*
+		 * In the transition between 255 to 256, powerpc will
+		 * read out of order in my_bcmp and see both bytes as
+		 * zero, so leave a placeholder below always non-zero
+		 * after the count, to avoid my_bcmp to trigger false
+		 * positives.
+		 */
+		*(area_count(area_src, nr) + 1) = 1;
+	}
+
+	pipefd = malloc(sizeof(int) * nr_cpus * 2);
+	if (!pipefd)
+		err("pipefd");
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		if (pipe2(&pipefd[cpu * 2], O_CLOEXEC | O_NONBLOCK))
+			err("pipe");
+}
+
+static inline void uffd_test_ctx_init(uint64_t features)
+{
+	uffd_test_ctx_init_ext(&features);
+}
+
 static int my_bcmp(char *str1, char *str2, size_t n)
 {
 	unsigned long i;
@@ -726,40 +832,6 @@ static int stress(struct uffd_stats *uffd_stats)
 	return 0;
 }
 
-static int userfaultfd_open_ext(uint64_t *features)
-{
-	struct uffdio_api uffdio_api;
-
-	uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
-	if (uffd < 0) {
-		fprintf(stderr,
-			"userfaultfd syscall not available in this kernel\n");
-		return 1;
-	}
-	uffd_flags = fcntl(uffd, F_GETFD, NULL);
-
-	uffdio_api.api = UFFD_API;
-	uffdio_api.features = *features;
-	if (ioctl(uffd, UFFDIO_API, &uffdio_api)) {
-		fprintf(stderr, "UFFDIO_API failed.\nPlease make sure to "
-			"run with either root or ptrace capability.\n");
-		return 1;
-	}
-	if (uffdio_api.api != UFFD_API) {
-		fprintf(stderr, "UFFDIO_API error: %" PRIu64 "\n",
-			(uint64_t)uffdio_api.api);
-		return 1;
-	}
-
-	*features = uffdio_api.features;
-	return 0;
-}
-
-static int userfaultfd_open(uint64_t features)
-{
-	return userfaultfd_open_ext(&features);
-}
-
 sigjmp_buf jbuf, *sigbuf;
 
 static void sighndl(int sig, siginfo_t *siginfo, void *ptr)
@@ -868,6 +940,8 @@ static int faulting_process(int signal_test)
 			  MREMAP_MAYMOVE | MREMAP_FIXED, area_src);
 	if (area_dst == MAP_FAILED)
 		err("mremap");
+	/* Reset area_src since we just clobbered it */
+	area_src = NULL;
 
 	for (; nr < nr_pages; nr++) {
 		count = *area_count(area_dst, nr);
@@ -961,10 +1035,8 @@ static int userfaultfd_zeropage_test(void)
 	printf("testing UFFDIO_ZEROPAGE: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
+	uffd_test_ctx_init(0);
 
-	if (userfaultfd_open(0))
-		return 1;
 	uffdio_register.range.start = (unsigned long) area_dst;
 	uffdio_register.range.len = nr_pages * page_size;
 	uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
@@ -981,7 +1053,6 @@ static int userfaultfd_zeropage_test(void)
 		if (my_bcmp(area_dst, zeropage, page_size))
 			err("zeropage is not zero");
 
-	close(uffd);
 	printf("done.\n");
 	return 0;
 }
@@ -999,12 +1070,10 @@ static int userfaultfd_events_test(void)
 	printf("testing events (fork, remap, remove): ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
 	features = UFFD_FEATURE_EVENT_FORK | UFFD_FEATURE_EVENT_REMAP |
 		UFFD_FEATURE_EVENT_REMOVE;
-	if (userfaultfd_open(features))
-		return 1;
+	uffd_test_ctx_init(features);
+
 	fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
 
 	uffdio_register.range.start = (unsigned long) area_dst;
@@ -1037,8 +1106,6 @@ static int userfaultfd_events_test(void)
 	if (pthread_join(uffd_mon, NULL))
 		return 1;
 
-	close(uffd);
-
 	uffd_stats_report(&stats, 1);
 
 	return stats.missing_faults != nr_pages;
@@ -1058,11 +1125,9 @@ static int userfaultfd_sig_test(void)
 	printf("testing signal delivery: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
 	features = UFFD_FEATURE_EVENT_FORK|UFFD_FEATURE_SIGBUS;
-	if (userfaultfd_open(features))
-		return 1;
+	uffd_test_ctx_init(features);
+
 	fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
 
 	uffdio_register.range.start = (unsigned long) area_dst;
@@ -1103,7 +1168,6 @@ static int userfaultfd_sig_test(void)
 	printf("done.\n");
 	if (userfaults)
 		err("Signal test failed, userfaults: %ld", userfaults);
-	close(uffd);
 
 	return userfaults != 0;
 }
@@ -1126,10 +1190,7 @@ static int userfaultfd_minor_test(void)
 	printf("testing minor faults: ");
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
-
-	if (userfaultfd_open_ext(&features))
-		return 1;
+	uffd_test_ctx_init_ext(&features);
 	/* If kernel reports the feature isn't supported, skip the test. */
 	if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
 		printf("skipping test due to lack of feature support\n");
@@ -1183,8 +1244,6 @@ static int userfaultfd_minor_test(void)
 	if (pthread_join(uffd_mon, NULL))
 		return 1;
 
-	close(uffd);
-
 	uffd_stats_report(&stats, 1);
 
 	return stats.missing_faults != 0 || stats.minor_faults != nr_pages;
@@ -1196,50 +1255,9 @@ static int userfaultfd_stress(void)
 	char *tmp_area;
 	unsigned long nr;
 	struct uffdio_register uffdio_register;
-	unsigned long cpu;
 	struct uffd_stats uffd_stats[nr_cpus];
 
-	uffd_test_ops->allocate_area((void **)&area_src);
-	if (!area_src)
-		return 1;
-	uffd_test_ops->allocate_area((void **)&area_dst);
-	if (!area_dst)
-		return 1;
-
-	if (userfaultfd_open(0))
-		return 1;
-
-	count_verify = malloc(nr_pages * sizeof(unsigned long long));
-	if (!count_verify) {
-		perror("count_verify");
-		return 1;
-	}
-
-	for (nr = 0; nr < nr_pages; nr++) {
-		*area_mutex(area_src, nr) = (pthread_mutex_t)
-			PTHREAD_MUTEX_INITIALIZER;
-		count_verify[nr] = *area_count(area_src, nr) = 1;
-		/*
-		 * In the transition between 255 to 256, powerpc will
-		 * read out of order in my_bcmp and see both bytes as
-		 * zero, so leave a placeholder below always non-zero
-		 * after the count, to avoid my_bcmp to trigger false
-		 * positives.
-		 */
-		*(area_count(area_src, nr) + 1) = 1;
-	}
-
-	pipefd = malloc(sizeof(int) * nr_cpus * 2);
-	if (!pipefd) {
-		perror("pipefd");
-		return 1;
-	}
-	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		if (pipe2(&pipefd[cpu*2], O_CLOEXEC | O_NONBLOCK)) {
-			perror("pipe");
-			return 1;
-		}
-	}
+	uffd_test_ctx_init(0);
 
 	if (posix_memalign(&area, page_size, page_size))
 		err("out of memory");
@@ -1360,7 +1378,6 @@ static int userfaultfd_stress(void)
 		uffd_stats_report(uffd_stats, nr_cpus);
 	}
 
-	close(uffd);
 	return userfaultfd_zeropage_test() || userfaultfd_sig_test()
 		|| userfaultfd_events_test() || userfaultfd_minor_test();
 }
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 10/10] userfaultfd/selftests: exercise minor fault handling shmem support
  2021-04-27 22:52 ` Axel Rasmussen
@ 2021-04-27 22:52   ` Axel Rasmussen
  -1 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the
test slightly to pass in / check for the right feature flags.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 29 ++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 3fbc69f513dc..a7ecc9993439 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -474,6 +474,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
 static void continue_range(int ufd, __u64 start, __u64 len)
 {
 	struct uffdio_continue req;
+	int ret;
 
 	req.range.start = start;
 	req.range.len = len;
@@ -482,6 +483,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
 	if (ioctl(ufd, UFFDIO_CONTINUE, &req))
 		err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
 		    (uint64_t)start);
+
+	/*
+	 * Error handling within the kernel for continue is subtly different
+	 * from copy or zeropage, so it may be a source of bugs. Trigger an
+	 * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+	 */
+	req.mapped = 0;
+	ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+	if (ret >= 0 || req.mapped != -EEXIST)
+		err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+		    ret, (int64_t) req.mapped);
 }
 
 static void *locking_thread(void *arg)
@@ -1182,7 +1194,7 @@ static int userfaultfd_minor_test(void)
 	void *expected_page;
 	char c;
 	struct uffd_stats stats = { 0 };
-	uint64_t features = UFFD_FEATURE_MINOR_HUGETLBFS;
+	uint64_t req_features, features_out;
 
 	if (!test_uffdio_minor)
 		return 0;
@@ -1190,9 +1202,17 @@ static int userfaultfd_minor_test(void)
 	printf("testing minor faults: ");
 	fflush(stdout);
 
-	uffd_test_ctx_init_ext(&features);
-	/* If kernel reports the feature isn't supported, skip the test. */
-	if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
+	if (test_type == TEST_HUGETLB)
+		req_features = UFFD_FEATURE_MINOR_HUGETLBFS;
+	else if (test_type == TEST_SHMEM)
+		req_features = UFFD_FEATURE_MINOR_SHMEM;
+	else
+		return 1;
+
+	features_out = req_features;
+	uffd_test_ctx_init_ext(&features_out);
+	/* If kernel reports required features aren't supported, skip test. */
+	if ((features_out & req_features) != req_features) {
 		printf("skipping test due to lack of feature support\n");
 		fflush(stdout);
 		return 0;
@@ -1426,6 +1446,7 @@ static void set_test_type(const char *type)
 		map_shared = true;
 		test_type = TEST_SHMEM;
 		uffd_test_ops = &shmem_uffd_test_ops;
+		test_uffdio_minor = true;
 	} else {
 		err("Unknown test type: %s", type);
 	}
-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 10/10] userfaultfd/selftests: exercise minor fault handling shmem support
@ 2021-04-27 22:52   ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-27 22:52 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing
  Cc: linux-api, linux-fsdevel, linux-kernel, linux-kselftest,
	linux-mm, Axel Rasmussen, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the
test slightly to pass in / check for the right feature flags.

Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 29 ++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 3fbc69f513dc..a7ecc9993439 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -474,6 +474,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
 static void continue_range(int ufd, __u64 start, __u64 len)
 {
 	struct uffdio_continue req;
+	int ret;
 
 	req.range.start = start;
 	req.range.len = len;
@@ -482,6 +483,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
 	if (ioctl(ufd, UFFDIO_CONTINUE, &req))
 		err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
 		    (uint64_t)start);
+
+	/*
+	 * Error handling within the kernel for continue is subtly different
+	 * from copy or zeropage, so it may be a source of bugs. Trigger an
+	 * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+	 */
+	req.mapped = 0;
+	ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+	if (ret >= 0 || req.mapped != -EEXIST)
+		err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+		    ret, (int64_t) req.mapped);
 }
 
 static void *locking_thread(void *arg)
@@ -1182,7 +1194,7 @@ static int userfaultfd_minor_test(void)
 	void *expected_page;
 	char c;
 	struct uffd_stats stats = { 0 };
-	uint64_t features = UFFD_FEATURE_MINOR_HUGETLBFS;
+	uint64_t req_features, features_out;
 
 	if (!test_uffdio_minor)
 		return 0;
@@ -1190,9 +1202,17 @@ static int userfaultfd_minor_test(void)
 	printf("testing minor faults: ");
 	fflush(stdout);
 
-	uffd_test_ctx_init_ext(&features);
-	/* If kernel reports the feature isn't supported, skip the test. */
-	if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
+	if (test_type == TEST_HUGETLB)
+		req_features = UFFD_FEATURE_MINOR_HUGETLBFS;
+	else if (test_type == TEST_SHMEM)
+		req_features = UFFD_FEATURE_MINOR_SHMEM;
+	else
+		return 1;
+
+	features_out = req_features;
+	uffd_test_ctx_init_ext(&features_out);
+	/* If kernel reports required features aren't supported, skip test. */
+	if ((features_out & req_features) != req_features) {
 		printf("skipping test due to lack of feature support\n");
 		fflush(stdout);
 		return 0;
@@ -1426,6 +1446,7 @@ static void set_test_type(const char *type)
 		map_shared = true;
 		test_type = TEST_SHMEM;
 		uffd_test_ops = &shmem_uffd_test_ops;
+		test_uffdio_minor = true;
 	} else {
 		err("Unknown test type: %s", type);
 	}
-- 
2.31.1.498.g6c1eba8ee3d-goog



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 03/10] userfaultfd/shmem: support minor fault registration for shmem
  2021-04-27 22:52   ` Axel Rasmussen
@ 2021-04-28  0:02     ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:02 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> This patch allows shmem-backed VMAs to be registered for minor faults.
> Minor faults are appropriately relayed to userspace in the fault path,
> for VMAs with the relevant flag.
> 
> This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
> minor faults, though, so userspace doesn't yet have a way to resolve
> such faults.
> 
> Because of this, we also don't yet advertise this as a supported
> feature. That will be done in a separate commit when the feature is
> fully implemented.
> 
> Acked-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  fs/userfaultfd.c |  3 +--
>  mm/memory.c      |  8 +++++---
>  mm/shmem.c       | 12 +++++++++++-
>  3 files changed, 17 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 03/10] userfaultfd/shmem: support minor fault registration for shmem
@ 2021-04-28  0:02     ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:02 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> This patch allows shmem-backed VMAs to be registered for minor faults.
> Minor faults are appropriately relayed to userspace in the fault path,
> for VMAs with the relevant flag.
> 
> This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
> minor faults, though, so userspace doesn't yet have a way to resolve
> such faults.
> 
> Because of this, we also don't yet advertise this as a supported
> feature. That will be done in a separate commit when the feature is
> fully implemented.
> 
> Acked-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  fs/userfaultfd.c |  3 +--
>  mm/memory.c      |  8 +++++---
>  mm/shmem.c       | 12 +++++++++++-
>  3 files changed, 17 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  2021-04-27 22:52   ` Axel Rasmussen
@ 2021-04-28  0:03     ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:03 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> With this change, userspace can resolve a minor fault within a
> shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
> match those for hugetlbfs - we look up the existing page in the page
> cache, and install a PTE for it.
> 
> This commit introduces a new helper: mcopy_atomic_install_pte.
> 
> Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
> shmem.c? The existing userfault implementation only relies on shmem.c
> for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
> fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
> shmem in one place, regardless of shared/private (to reduce code
> duplication).
> 
> Why add a new mcopy_atomic_install_pte helper? A problem we have with
> continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
> *close* to what we want, but not exactly. We do want to setup the PTEs
> in a CONTINUE operation, but we don't want to e.g. allocate a new page,
> charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
> we have the problem stated above: shmem_mcopy_atomic_pte() and
> mcopy_atomic_pte() both handle one-half of the problem (shared /
> private) continue cares about. So, introduce mcontinue_atomic_pte(), to
> handle all of the shmem continue cases. Introduce the helper so it
> doesn't duplicate code with mcopy_atomic_pte().
> 
> In a future commit, shmem_mcopy_atomic_pte() will also be modified to
> use this new helper. However, since this is a bigger refactor, it seems
> most clear to do it as a separate change.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/userfaultfd.c | 172 ++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 127 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
@ 2021-04-28  0:03     ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:03 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> With this change, userspace can resolve a minor fault within a
> shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
> match those for hugetlbfs - we look up the existing page in the page
> cache, and install a PTE for it.
> 
> This commit introduces a new helper: mcopy_atomic_install_pte.
> 
> Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
> shmem.c? The existing userfault implementation only relies on shmem.c
> for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
> fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
> shmem in one place, regardless of shared/private (to reduce code
> duplication).
> 
> Why add a new mcopy_atomic_install_pte helper? A problem we have with
> continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
> *close* to what we want, but not exactly. We do want to setup the PTEs
> in a CONTINUE operation, but we don't want to e.g. allocate a new page,
> charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
> we have the problem stated above: shmem_mcopy_atomic_pte() and
> mcopy_atomic_pte() both handle one-half of the problem (shared /
> private) continue cares about. So, introduce mcontinue_atomic_pte(), to
> handle all of the shmem continue cases. Introduce the helper so it
> doesn't duplicate code with mcopy_atomic_pte().
> 
> In a future commit, shmem_mcopy_atomic_pte() will also be modified to
> use this new helper. However, since this is a bigger refactor, it seems
> most clear to do it as a separate change.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/userfaultfd.c | 172 ++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 127 insertions(+), 45 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support
  2021-04-27 22:52   ` Axel Rasmussen
@ 2021-04-28  0:04     ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:04 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> Now that the feature is fully implemented (the faulting path hooks exist
> so userspace is notified, and the ioctl to resolve such faults is
> available), advertise this as a supported feature.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  Documentation/admin-guide/mm/userfaultfd.rst | 3 ++-
>  fs/userfaultfd.c                             | 3 ++-
>  include/uapi/linux/userfaultfd.h             | 7 ++++++-
>  3 files changed, 10 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support
@ 2021-04-28  0:04     ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:04 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> Now that the feature is fully implemented (the faulting path hooks exist
> so userspace is notified, and the ioctl to resolve such faults is
> available), advertise this as a supported feature.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  Documentation/admin-guide/mm/userfaultfd.rst | 3 ++-
>  fs/userfaultfd.c                             | 3 ++-
>  include/uapi/linux/userfaultfd.h             | 7 ++++++-
>  3 files changed, 10 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  2021-04-27 22:52   ` Axel Rasmussen
@ 2021-04-28  0:58     ` Hugh Dickins
  -1 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:58 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> In a previous commit, we added the mcopy_atomic_install_pte() helper.
> This helper does the job of setting up PTEs for an existing page, to map
> it into a given VMA. It deals with both the anon and shmem cases, as
> well as the shared and private cases.
> 
> In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> directly, to reduce code duplication.
> 
> This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> 
> Instead of doing accounting (shmem_recalc_inode() et al) part-way
> through the PTE setup, do it afterward. This frees up
> mcopy_atomic_install_pte() from having to care about this accounting,
> and means we don't need to e.g. shmem_uncharge() in the error path.
> 
> A side effect is this switches shmem_mcopy_atomic_pte() to use
> lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> This wrapper does some extra accounting in an exceptional case, if
> appropriate, so it's actually the more correct thing to use.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Not quite. Two things.

One, in this version, delete_from_page_cache(page) has vanished
from the particular error path which needs it.

Two, and I think this predates your changes (so needs a separate
fix patch first, for backport to stable? a user with bad intentions
might be able to trigger the BUG), in pondering the new error paths
and that /* don't free the page */ one in particular, isn't it the
case that the shmem_inode_acct_block() on entry might succeed the
first time, but atomic copy fail so -ENOENT, then something else
fill up the tmpfs before the retry comes in, so that retry then
fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?

(As I understand it, the shmem_inode_unacct_blocks() has to be
done before returning, because the caller may be unable to retry.)

What the right fix is rather depends on other uses of __mcopy_atomic():
if they obviously cannot hit that BUG_ON(page), you may prefer to leave
it in, and fix it here where shmem_inode_acct_block() fails. Or you may
prefer instead to delete that "else BUG_ON(page);" - looks as if that
would end up doing the right thing.  Peter may have a preference.

(Or, we could consider doing the shmem_inode_acct_block() only after
the page has been copied in: its current placing reflects how shmem.c
does it elsewhere, and there's reason for that, but it doesn't always
work out right. Don't be surprised if I change the ordering in future,
but it's probably best not to mess with that ordering now.)

Sorry, if this is a pre-existing issue, then we are taking advantage
of you, in asking you to fix it: but I hope that while you're in there,
it will make sense to do so.

Thanks,
Hugh

> ---
>  include/linux/userfaultfd_k.h |  5 ++++
>  mm/shmem.c                    | 48 +++++------------------------------
>  mm/userfaultfd.c              | 17 +++++--------
>  3 files changed, 18 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
@ 2021-04-28  0:58     ` Hugh Dickins
  0 siblings, 0 replies; 41+ messages in thread
From: Hugh Dickins @ 2021-04-28  0:58 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Peter Xu, Shaohua Li, Shuah Khan,
	Stephen Rothwell, Wang Qing, linux-api, linux-fsdevel,
	linux-kernel, linux-kselftest, linux-mm, Brian Geffon,
	Dr . David Alan Gilbert, Mina Almasry, Oliver Upton

On Tue, 27 Apr 2021, Axel Rasmussen wrote:

> In a previous commit, we added the mcopy_atomic_install_pte() helper.
> This helper does the job of setting up PTEs for an existing page, to map
> it into a given VMA. It deals with both the anon and shmem cases, as
> well as the shared and private cases.
> 
> In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> directly, to reduce code duplication.
> 
> This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> 
> Instead of doing accounting (shmem_recalc_inode() et al) part-way
> through the PTE setup, do it afterward. This frees up
> mcopy_atomic_install_pte() from having to care about this accounting,
> and means we don't need to e.g. shmem_uncharge() in the error path.
> 
> A side effect is this switches shmem_mcopy_atomic_pte() to use
> lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> This wrapper does some extra accounting in an exceptional case, if
> appropriate, so it's actually the more correct thing to use.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Not quite. Two things.

One, in this version, delete_from_page_cache(page) has vanished
from the particular error path which needs it.

Two, and I think this predates your changes (so needs a separate
fix patch first, for backport to stable? a user with bad intentions
might be able to trigger the BUG), in pondering the new error paths
and that /* don't free the page */ one in particular, isn't it the
case that the shmem_inode_acct_block() on entry might succeed the
first time, but atomic copy fail so -ENOENT, then something else
fill up the tmpfs before the retry comes in, so that retry then
fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?

(As I understand it, the shmem_inode_unacct_blocks() has to be
done before returning, because the caller may be unable to retry.)

What the right fix is rather depends on other uses of __mcopy_atomic():
if they obviously cannot hit that BUG_ON(page), you may prefer to leave
it in, and fix it here where shmem_inode_acct_block() fails. Or you may
prefer instead to delete that "else BUG_ON(page);" - looks as if that
would end up doing the right thing.  Peter may have a preference.

(Or, we could consider doing the shmem_inode_acct_block() only after
the page has been copied in: its current placing reflects how shmem.c
does it elsewhere, and there's reason for that, but it doesn't always
work out right. Don't be surprised if I change the ordering in future,
but it's probably best not to mess with that ordering now.)

Sorry, if this is a pre-existing issue, then we are taking advantage
of you, in asking you to fix it: but I hope that while you're in there,
it will make sense to do so.

Thanks,
Hugh

> ---
>  include/linux/userfaultfd_k.h |  5 ++++
>  mm/shmem.c                    | 48 +++++------------------------------
>  mm/userfaultfd.c              | 17 +++++--------
>  3 files changed, 18 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  2021-04-27 22:52   ` Axel Rasmussen
  (?)
  (?)
@ 2021-04-28 15:10   ` Peter Xu
  -1 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2021-04-28 15:10 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

On Tue, Apr 27, 2021 at 03:52:38PM -0700, Axel Rasmussen wrote:
> With this change, userspace can resolve a minor fault within a
> shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
> match those for hugetlbfs - we look up the existing page in the page
> cache, and install a PTE for it.
> 
> This commit introduces a new helper: mcopy_atomic_install_pte.
> 
> Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
> shmem.c? The existing userfault implementation only relies on shmem.c
> for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
> fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
> shmem in one place, regardless of shared/private (to reduce code
> duplication).
> 
> Why add a new mcopy_atomic_install_pte helper? A problem we have with
> continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
> *close* to what we want, but not exactly. We do want to setup the PTEs
> in a CONTINUE operation, but we don't want to e.g. allocate a new page,
> charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
> we have the problem stated above: shmem_mcopy_atomic_pte() and
> mcopy_atomic_pte() both handle one-half of the problem (shared /
> private) continue cares about. So, introduce mcontinue_atomic_pte(), to
> handle all of the shmem continue cases. Introduce the helper so it
> doesn't duplicate code with mcopy_atomic_pte().
> 
> In a future commit, shmem_mcopy_atomic_pte() will also be modified to
> use this new helper. However, since this is a bigger refactor, it seems
> most clear to do it as a separate change.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support
  2021-04-27 22:52   ` Axel Rasmussen
  (?)
  (?)
@ 2021-04-28 15:11   ` Peter Xu
  -1 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2021-04-28 15:11 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

On Tue, Apr 27, 2021 at 03:52:39PM -0700, Axel Rasmussen wrote:
> Now that the feature is fully implemented (the faulting path hooks exist
> so userspace is notified, and the ioctl to resolve such faults is
> available), advertise this as a supported feature.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  2021-04-28  0:58     ` Hugh Dickins
  (?)
@ 2021-04-28 15:56     ` Peter Xu
  2021-04-28 15:59         ` Axel Rasmussen
  -1 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2021-04-28 15:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Axel Rasmussen, Alexander Viro, Andrea Arcangeli, Andrew Morton,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

On Tue, Apr 27, 2021 at 05:58:16PM -0700, Hugh Dickins wrote:
> On Tue, 27 Apr 2021, Axel Rasmussen wrote:
> 
> > In a previous commit, we added the mcopy_atomic_install_pte() helper.
> > This helper does the job of setting up PTEs for an existing page, to map
> > it into a given VMA. It deals with both the anon and shmem cases, as
> > well as the shared and private cases.
> > 
> > In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> > handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> > directly, to reduce code duplication.
> > 
> > This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> > 
> > Instead of doing accounting (shmem_recalc_inode() et al) part-way
> > through the PTE setup, do it afterward. This frees up
> > mcopy_atomic_install_pte() from having to care about this accounting,
> > and means we don't need to e.g. shmem_uncharge() in the error path.
> > 
> > A side effect is this switches shmem_mcopy_atomic_pte() to use
> > lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> > This wrapper does some extra accounting in an exceptional case, if
> > appropriate, so it's actually the more correct thing to use.
> > 
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> 
> Not quite. Two things.
> 
> One, in this version, delete_from_page_cache(page) has vanished
> from the particular error path which needs it.

Agreed.  I also spotted that the set_page_dirty() seems to have been overlooked
when reusing mcopy_atomic_install_pte(), which afaiu should be move into the
helper.

> 
> Two, and I think this predates your changes (so needs a separate
> fix patch first, for backport to stable? a user with bad intentions
> might be able to trigger the BUG), in pondering the new error paths
> and that /* don't free the page */ one in particular, isn't it the
> case that the shmem_inode_acct_block() on entry might succeed the
> first time, but atomic copy fail so -ENOENT, then something else
> fill up the tmpfs before the retry comes in, so that retry then
> fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?
> 
> (As I understand it, the shmem_inode_unacct_blocks() has to be
> done before returning, because the caller may be unable to retry.)
> 
> What the right fix is rather depends on other uses of __mcopy_atomic():
> if they obviously cannot hit that BUG_ON(page), you may prefer to leave
> it in, and fix it here where shmem_inode_acct_block() fails. Or you may
> prefer instead to delete that "else BUG_ON(page);" - looks as if that
> would end up doing the right thing.  Peter may have a preference.

To me, the BUG_ON(page) wanted to guarantee mfill_atomic_pte() should have
consumed the page properly when possible.  Removing the BUG_ON() looks good
already, it will just stop covering the case when e.g. ret==0.

So maybe slightly better to release the page when shmem_inode_acct_block()
fails (so as to still keep some guard on the page)?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  2021-04-28 15:56     ` Peter Xu
@ 2021-04-28 15:59         ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-28 15:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Alexander Viro, Andrea Arcangeli, Andrew Morton,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, LKML, linux-kselftest,
	Linux MM, Brian Geffon, Dr . David Alan Gilbert, Mina Almasry,
	Oliver Upton

On Wed, Apr 28, 2021 at 8:56 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 05:58:16PM -0700, Hugh Dickins wrote:
> > On Tue, 27 Apr 2021, Axel Rasmussen wrote:
> >
> > > In a previous commit, we added the mcopy_atomic_install_pte() helper.
> > > This helper does the job of setting up PTEs for an existing page, to map
> > > it into a given VMA. It deals with both the anon and shmem cases, as
> > > well as the shared and private cases.
> > >
> > > In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> > > handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> > > directly, to reduce code duplication.
> > >
> > > This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> > >
> > > Instead of doing accounting (shmem_recalc_inode() et al) part-way
> > > through the PTE setup, do it afterward. This frees up
> > > mcopy_atomic_install_pte() from having to care about this accounting,
> > > and means we don't need to e.g. shmem_uncharge() in the error path.
> > >
> > > A side effect is this switches shmem_mcopy_atomic_pte() to use
> > > lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> > > This wrapper does some extra accounting in an exceptional case, if
> > > appropriate, so it's actually the more correct thing to use.
> > >
> > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> >
> > Not quite. Two things.
> >
> > One, in this version, delete_from_page_cache(page) has vanished
> > from the particular error path which needs it.
>
> Agreed.  I also spotted that the set_page_dirty() seems to have been overlooked
> when reusing mcopy_atomic_install_pte(), which afaiu should be move into the
> helper.

I think this is covered: we explicitly call SetPageDirty() just before
returning in shmem_mcopy_atomic_pte(). If I remember correctly from a
couple of revisions ago, we consciously put it here instead of in the
helper because it resulted in simpler code (error handling in
particular, I think?), and not all callers of the new helper need it.

>
> >
> > Two, and I think this predates your changes (so needs a separate
> > fix patch first, for backport to stable? a user with bad intentions
> > might be able to trigger the BUG), in pondering the new error paths
> > and that /* don't free the page */ one in particular, isn't it the
> > case that the shmem_inode_acct_block() on entry might succeed the
> > first time, but atomic copy fail so -ENOENT, then something else
> > fill up the tmpfs before the retry comes in, so that retry then
> > fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?
> >
> > (As I understand it, the shmem_inode_unacct_blocks() has to be
> > done before returning, because the caller may be unable to retry.)
> >
> > What the right fix is rather depends on other uses of __mcopy_atomic():
> > if they obviously cannot hit that BUG_ON(page), you may prefer to leave
> > it in, and fix it here where shmem_inode_acct_block() fails. Or you may
> > prefer instead to delete that "else BUG_ON(page);" - looks as if that
> > would end up doing the right thing.  Peter may have a preference.
>
> To me, the BUG_ON(page) wanted to guarantee mfill_atomic_pte() should have
> consumed the page properly when possible.  Removing the BUG_ON() looks good
> already, it will just stop covering the case when e.g. ret==0.
>
> So maybe slightly better to release the page when shmem_inode_acct_block()
> fails (so as to still keep some guard on the page)?

This second issue, I will take some more time to investigate. :)

>
> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
@ 2021-04-28 15:59         ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-04-28 15:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Alexander Viro, Andrea Arcangeli, Andrew Morton,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, LKML, linux-kselftest,
	Linux MM, Brian Geffon, Dr . David Alan Gilbert, Mina Almasry,
	Oliver Upton

On Wed, Apr 28, 2021 at 8:56 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 05:58:16PM -0700, Hugh Dickins wrote:
> > On Tue, 27 Apr 2021, Axel Rasmussen wrote:
> >
> > > In a previous commit, we added the mcopy_atomic_install_pte() helper.
> > > This helper does the job of setting up PTEs for an existing page, to map
> > > it into a given VMA. It deals with both the anon and shmem cases, as
> > > well as the shared and private cases.
> > >
> > > In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> > > handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> > > directly, to reduce code duplication.
> > >
> > > This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> > >
> > > Instead of doing accounting (shmem_recalc_inode() et al) part-way
> > > through the PTE setup, do it afterward. This frees up
> > > mcopy_atomic_install_pte() from having to care about this accounting,
> > > and means we don't need to e.g. shmem_uncharge() in the error path.
> > >
> > > A side effect is this switches shmem_mcopy_atomic_pte() to use
> > > lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> > > This wrapper does some extra accounting in an exceptional case, if
> > > appropriate, so it's actually the more correct thing to use.
> > >
> > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> >
> > Not quite. Two things.
> >
> > One, in this version, delete_from_page_cache(page) has vanished
> > from the particular error path which needs it.
>
> Agreed.  I also spotted that the set_page_dirty() seems to have been overlooked
> when reusing mcopy_atomic_install_pte(), which afaiu should be move into the
> helper.

I think this is covered: we explicitly call SetPageDirty() just before
returning in shmem_mcopy_atomic_pte(). If I remember correctly from a
couple of revisions ago, we consciously put it here instead of in the
helper because it resulted in simpler code (error handling in
particular, I think?), and not all callers of the new helper need it.

>
> >
> > Two, and I think this predates your changes (so needs a separate
> > fix patch first, for backport to stable? a user with bad intentions
> > might be able to trigger the BUG), in pondering the new error paths
> > and that /* don't free the page */ one in particular, isn't it the
> > case that the shmem_inode_acct_block() on entry might succeed the
> > first time, but atomic copy fail so -ENOENT, then something else
> > fill up the tmpfs before the retry comes in, so that retry then
> > fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?
> >
> > (As I understand it, the shmem_inode_unacct_blocks() has to be
> > done before returning, because the caller may be unable to retry.)
> >
> > What the right fix is rather depends on other uses of __mcopy_atomic():
> > if they obviously cannot hit that BUG_ON(page), you may prefer to leave
> > it in, and fix it here where shmem_inode_acct_block() fails. Or you may
> > prefer instead to delete that "else BUG_ON(page);" - looks as if that
> > would end up doing the right thing.  Peter may have a preference.
>
> To me, the BUG_ON(page) wanted to guarantee mfill_atomic_pte() should have
> consumed the page properly when possible.  Removing the BUG_ON() looks good
> already, it will just stop covering the case when e.g. ret==0.
>
> So maybe slightly better to release the page when shmem_inode_acct_block()
> fails (so as to still keep some guard on the page)?

This second issue, I will take some more time to investigate. :)

>
> Thanks,
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
  2021-04-28 15:59         ` Axel Rasmussen
  (?)
@ 2021-04-28 16:23         ` Peter Xu
  -1 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2021-04-28 16:23 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Hugh Dickins, Alexander Viro, Andrea Arcangeli, Andrew Morton,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, LKML, linux-kselftest,
	Linux MM, Brian Geffon, Dr . David Alan Gilbert, Mina Almasry,
	Oliver Upton

On Wed, Apr 28, 2021 at 08:59:53AM -0700, Axel Rasmussen wrote:
> On Wed, Apr 28, 2021 at 8:56 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Apr 27, 2021 at 05:58:16PM -0700, Hugh Dickins wrote:
> > > On Tue, 27 Apr 2021, Axel Rasmussen wrote:
> > >
> > > > In a previous commit, we added the mcopy_atomic_install_pte() helper.
> > > > This helper does the job of setting up PTEs for an existing page, to map
> > > > it into a given VMA. It deals with both the anon and shmem cases, as
> > > > well as the shared and private cases.
> > > >
> > > > In other words, shmem_mcopy_atomic_pte() duplicates a case it already
> > > > handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
> > > > directly, to reduce code duplication.
> > > >
> > > > This requires that we refactor shmem_mcopy_atomic_pte() a bit:
> > > >
> > > > Instead of doing accounting (shmem_recalc_inode() et al) part-way
> > > > through the PTE setup, do it afterward. This frees up
> > > > mcopy_atomic_install_pte() from having to care about this accounting,
> > > > and means we don't need to e.g. shmem_uncharge() in the error path.
> > > >
> > > > A side effect is this switches shmem_mcopy_atomic_pte() to use
> > > > lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
> > > > This wrapper does some extra accounting in an exceptional case, if
> > > > appropriate, so it's actually the more correct thing to use.
> > > >
> > > > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > >
> > > Not quite. Two things.
> > >
> > > One, in this version, delete_from_page_cache(page) has vanished
> > > from the particular error path which needs it.
> >
> > Agreed.  I also spotted that the set_page_dirty() seems to have been overlooked
> > when reusing mcopy_atomic_install_pte(), which afaiu should be move into the
> > helper.
> 
> I think this is covered: we explicitly call SetPageDirty() just before
> returning in shmem_mcopy_atomic_pte(). If I remember correctly from a
> couple of revisions ago, we consciously put it here instead of in the
> helper because it resulted in simpler code (error handling in
> particular, I think?), and not all callers of the new helper need it.

Indeed, yes that looks okay.

> 
> >
> > >
> > > Two, and I think this predates your changes (so needs a separate
> > > fix patch first, for backport to stable? a user with bad intentions
> > > might be able to trigger the BUG), in pondering the new error paths
> > > and that /* don't free the page */ one in particular, isn't it the
> > > case that the shmem_inode_acct_block() on entry might succeed the
> > > first time, but atomic copy fail so -ENOENT, then something else
> > > fill up the tmpfs before the retry comes in, so that retry then
> > > fail with -ENOMEM, and hit the BUG_ON(page) in __mcopy_atomic()?
> > >
> > > (As I understand it, the shmem_inode_unacct_blocks() has to be
> > > done before returning, because the caller may be unable to retry.)
> > >
> > > What the right fix is rather depends on other uses of __mcopy_atomic():
> > > if they obviously cannot hit that BUG_ON(page), you may prefer to leave
> > > it in, and fix it here where shmem_inode_acct_block() fails. Or you may
> > > prefer instead to delete that "else BUG_ON(page);" - looks as if that
> > > would end up doing the right thing.  Peter may have a preference.
> >
> > To me, the BUG_ON(page) wanted to guarantee mfill_atomic_pte() should have
> > consumed the page properly when possible.  Removing the BUG_ON() looks good
> > already, it will just stop covering the case when e.g. ret==0.
> >
> > So maybe slightly better to release the page when shmem_inode_acct_block()
> > fails (so as to still keep some guard on the page)?
> 
> This second issue, I will take some more time to investigate. :)

No worry - take your time. :)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
  2021-04-27 22:52   ` Axel Rasmussen
  (?)
@ 2021-04-28 17:23   ` Peter Xu
  -1 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2021-04-28 17:23 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

On Tue, Apr 27, 2021 at 03:52:43PM -0700, Axel Rasmussen wrote:
> Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
> mutates this state in some way, in some cases really "clobbering it"
> (e.g., the events test mremap-ing area_dst over the top of area_src, or
> the minor faults tests overwriting the count_verify values in the test
> areas). We run the tests in a particular order, each test is careful to
> make the right assumptions about its starting state, etc.
> 
> But, this is fragile. It's better for a test's success or failure to not
> depend on what some other prior test case did to the global state.
> 
> To that end, clear and reinitialize the test context at the start of
> each test case, so whatever prior test cases did doesn't affect future
> tests.
> 
> This is particularly relevant to this series because the events test's
> mremap of area_dst screws up assumptions the minor fault test was
> relying on. This wasn't a problem for hugetlb, as we don't mremap in
> that case.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 10/10] userfaultfd/selftests: exercise minor fault handling shmem support
  2021-04-27 22:52   ` Axel Rasmussen
  (?)
@ 2021-04-28 17:26   ` Peter Xu
  -1 siblings, 0 replies; 41+ messages in thread
From: Peter Xu @ 2021-04-28 17:26 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

On Tue, Apr 27, 2021 at 03:52:44PM -0700, Axel Rasmussen wrote:
> Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the
> test slightly to pass in / check for the right feature flags.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
  2021-04-27 22:52   ` Axel Rasmussen
  (?)
  (?)
@ 2021-05-18 20:57   ` Peter Xu
  2021-05-18 22:28       ` Axel Rasmussen
  -1 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2021-05-18 20:57 UTC (permalink / raw)
  To: Axel Rasmussen, Andrew Morton
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, linux-kernel,
	linux-kselftest, linux-mm, Brian Geffon, Dr . David Alan Gilbert,
	Mina Almasry, Oliver Upton

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

On Tue, Apr 27, 2021 at 03:52:43PM -0700, Axel Rasmussen wrote:
> Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
> mutates this state in some way, in some cases really "clobbering it"
> (e.g., the events test mremap-ing area_dst over the top of area_src, or
> the minor faults tests overwriting the count_verify values in the test
> areas). We run the tests in a particular order, each test is careful to
> make the right assumptions about its starting state, etc.
> 
> But, this is fragile. It's better for a test's success or failure to not
> depend on what some other prior test case did to the global state.
> 
> To that end, clear and reinitialize the test context at the start of
> each test case, so whatever prior test cases did doesn't affect future
> tests.
> 
> This is particularly relevant to this series because the events test's
> mremap of area_dst screws up assumptions the minor fault test was
> relying on. This wasn't a problem for hugetlb, as we don't mremap in
> that case.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Hi, Andrew,

There's a conflict on the uffd test case with v5.13-rc1-mmots-2021-05-13-17-23
between this patch and the uffd pagemap series, so I think we may need to queue
another fixup patch (to be squashed into this patch of Axel's) which is
attached.

Thanks,

-- 
Peter Xu

[-- Attachment #2: 0001-fixup-userfaultfd-selftests-reinitialize-test-contex.patch --]
[-- Type: text/plain, Size: 1456 bytes --]

From 745402175cc5670475df8e6c6bd03b6268f4175d Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 18 May 2021 16:50:36 -0400
Subject: [PATCH] fixup! userfaultfd/selftests: reinitialize test context in
 each test

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index c4150b4fbd17..f78816130c7f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -1326,7 +1326,7 @@ static void userfaultfd_pagemap_test(unsigned int test_pgsize)
 	/* Flush so it doesn't flush twice in parent/child later */
 	fflush(stdout);
 
-	uffd_test_ops->release_pages(area_dst);
+	uffd_test_ctx_init(0);
 
 	if (test_pgsize > page_size) {
 		/* This is a thp test */
@@ -1338,9 +1338,6 @@ static void userfaultfd_pagemap_test(unsigned int test_pgsize)
 			err("madvise(MADV_NOHUGEPAGE) failed");
 	}
 
-	if (userfaultfd_open(0))
-		err("userfaultfd_open");
-
 	uffdio_register.range.start = (unsigned long) area_dst;
 	uffdio_register.range.len = nr_pages * page_size;
 	uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
@@ -1383,7 +1380,6 @@ static void userfaultfd_pagemap_test(unsigned int test_pgsize)
 	pagemap_check_wp(value, false);
 
 	close(pagemap_fd);
-	close(uffd);
 	printf("done\n");
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
  2021-05-18 20:57   ` Peter Xu
@ 2021-05-18 22:28       ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-05-18 22:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Morton, Alexander Viro, Andrea Arcangeli, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, LKML, linux-kselftest,
	Linux MM, Brian Geffon, Dr . David Alan Gilbert, Mina Almasry,
	Oliver Upton

I suppose it will be squashed anyway, but in case it's useful feel free to add:

Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>

Thanks for catching this, Peter!

On Tue, May 18, 2021 at 1:57 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 03:52:43PM -0700, Axel Rasmussen wrote:
> > Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
> > mutates this state in some way, in some cases really "clobbering it"
> > (e.g., the events test mremap-ing area_dst over the top of area_src, or
> > the minor faults tests overwriting the count_verify values in the test
> > areas). We run the tests in a particular order, each test is careful to
> > make the right assumptions about its starting state, etc.
> >
> > But, this is fragile. It's better for a test's success or failure to not
> > depend on what some other prior test case did to the global state.
> >
> > To that end, clear and reinitialize the test context at the start of
> > each test case, so whatever prior test cases did doesn't affect future
> > tests.
> >
> > This is particularly relevant to this series because the events test's
> > mremap of area_dst screws up assumptions the minor fault test was
> > relying on. This wasn't a problem for hugetlb, as we don't mremap in
> > that case.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>
> Hi, Andrew,
>
> There's a conflict on the uffd test case with v5.13-rc1-mmots-2021-05-13-17-23
> between this patch and the uffd pagemap series, so I think we may need to queue
> another fixup patch (to be squashed into this patch of Axel's) which is
> attached.
>
> Thanks,
>
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test
@ 2021-05-18 22:28       ` Axel Rasmussen
  0 siblings, 0 replies; 41+ messages in thread
From: Axel Rasmussen @ 2021-05-18 22:28 UTC (permalink / raw)
  To: Peter Xu
  Cc: Andrew Morton, Alexander Viro, Andrea Arcangeli, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Kravetz,
	Mike Rapoport, Shaohua Li, Shuah Khan, Stephen Rothwell,
	Wang Qing, linux-api, linux-fsdevel, LKML, linux-kselftest,
	Linux MM, Brian Geffon, Dr . David Alan Gilbert, Mina Almasry,
	Oliver Upton

I suppose it will be squashed anyway, but in case it's useful feel free to add:

Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>

Thanks for catching this, Peter!

On Tue, May 18, 2021 at 1:57 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Apr 27, 2021 at 03:52:43PM -0700, Axel Rasmussen wrote:
> > Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
> > mutates this state in some way, in some cases really "clobbering it"
> > (e.g., the events test mremap-ing area_dst over the top of area_src, or
> > the minor faults tests overwriting the count_verify values in the test
> > areas). We run the tests in a particular order, each test is careful to
> > make the right assumptions about its starting state, etc.
> >
> > But, this is fragile. It's better for a test's success or failure to not
> > depend on what some other prior test case did to the global state.
> >
> > To that end, clear and reinitialize the test context at the start of
> > each test case, so whatever prior test cases did doesn't affect future
> > tests.
> >
> > This is particularly relevant to this series because the events test's
> > mremap of area_dst screws up assumptions the minor fault test was
> > relying on. This wasn't a problem for hugetlb, as we don't mremap in
> > that case.
> >
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>
> Hi, Andrew,
>
> There's a conflict on the uffd test case with v5.13-rc1-mmots-2021-05-13-17-23
> between this patch and the uffd pagemap series, so I think we may need to queue
> another fixup patch (to be squashed into this patch of Axel's) which is
> attached.
>
> Thanks,
>
> --
> Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-05-18 22:29 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 22:52 [PATCH v5 00/10] userfaultfd: add minor fault handling for shmem Axel Rasmussen
2021-04-27 22:52 ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 01/10] userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 02/10] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 03/10] userfaultfd/shmem: support minor fault registration for shmem Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28  0:02   ` Hugh Dickins
2021-04-28  0:02     ` Hugh Dickins
2021-04-27 22:52 ` [PATCH v5 04/10] userfaultfd/shmem: support UFFDIO_CONTINUE " Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28  0:03   ` Hugh Dickins
2021-04-28  0:03     ` Hugh Dickins
2021-04-28 15:10   ` Peter Xu
2021-04-27 22:52 ` [PATCH v5 05/10] userfaultfd/shmem: advertise shmem minor fault support Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28  0:04   ` Hugh Dickins
2021-04-28  0:04     ` Hugh Dickins
2021-04-28 15:11   ` Peter Xu
2021-04-27 22:52 ` [PATCH v5 06/10] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte() Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28  0:58   ` Hugh Dickins
2021-04-28  0:58     ` Hugh Dickins
2021-04-28 15:56     ` Peter Xu
2021-04-28 15:59       ` Axel Rasmussen
2021-04-28 15:59         ` Axel Rasmussen
2021-04-28 16:23         ` Peter Xu
2021-04-27 22:52 ` [PATCH v5 07/10] userfaultfd/selftests: use memfd_create for shmem test type Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 08/10] userfaultfd/selftests: create alias mappings in the shmem test Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 09/10] userfaultfd/selftests: reinitialize test context in each test Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28 17:23   ` Peter Xu
2021-05-18 20:57   ` Peter Xu
2021-05-18 22:28     ` Axel Rasmussen
2021-05-18 22:28       ` Axel Rasmussen
2021-04-27 22:52 ` [PATCH v5 10/10] userfaultfd/selftests: exercise minor fault handling shmem support Axel Rasmussen
2021-04-27 22:52   ` Axel Rasmussen
2021-04-28 17:26   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.