All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd
@ 2021-09-08 16:35 Peter Xu
  2021-09-08 16:35 ` [PATCH v3 1/5] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, Hugh Dickins, Andrew Morton
  Cc: Yang Shi, Miaohe Lin, Alistair Popple, Matthew Wilcox,
	David Hildenbrand, Jerome Glisse, Kirill A . Shutemov,
	Andrea Arcangeli, Liam Howlett, Mike Rapoport, peterx

[Based on linus/master, commit ac08b1c68d1b which should contain the recent -mm
 pull, if not applicable, I can repost anytime]

v3:
- Patch "mm: Add zap_skip_check_mapping() helper"
  - In zap_skip_check_mapping() check zap_mapping first [Alistair]
- Patch "mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags":
  - Fix English errors in commit message [David]
  - Drop paragraph mentioning commit 22061a1ffabd in commit msg
  - Set ZAP_FLAG_SKIP_SWAP for unmap_mapping_page() too
- Pick up r-bs

v2:
- Patch "mm: Clear vmf->pte after pte_unmap_same() returns"
  - Remove one comment [David]
- Collect r-b for patch 2/3
- Rewrite the last two patches to drop ZAP_FLAG_CHECK_MAPPING, dropping
  Alistair's r-b on patch 5 because it changed [David, Matthew]

===== v1 cover letter =====

I picked up these patches from uffd-wp v5 series here:

https://lore.kernel.org/lkml/20210715201422.211004-1-peterx@redhat.com/

IMHO all of them are very nice cleanups to existing code already, they're all
small and self-contained.  They'll be needed by uffd-wp coming series.  I would
appreciate if they can be accepted earlier, so as to not carry them over always
with the uffd-wp series.

I removed some CC from the uffd-wp v5 series to reduce the noise, and added a
few more into it.

Reviews are greatly welcomed, thanks.

Peter Xu (5):
  mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
  mm: Clear vmf->pte after pte_unmap_same() returns
  mm: Drop first_index/last_index in zap_details
  mm: Add zap_skip_check_mapping() helper
  mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags

 include/linux/mm.h | 34 +++++++++++++++++++--
 mm/memory.c        | 76 +++++++++++++++++++---------------------------
 mm/shmem.c         |  1 -
 mm/userfaultfd.c   |  3 +-
 4 files changed, 63 insertions(+), 51 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/5] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
@ 2021-09-08 16:35 ` Peter Xu
  2021-09-08 16:36 ` [PATCH v3 2/5] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm, Hugh Dickins, Andrew Morton
  Cc: Yang Shi, Miaohe Lin, Alistair Popple, Matthew Wilcox,
	David Hildenbrand, Jerome Glisse, Kirill A . Shutemov,
	Andrea Arcangeli, Liam Howlett, Mike Rapoport, peterx,
	Axel Rasmussen

It was conditionally done previously, as there's one shmem special case that we
use SetPageDirty() instead.  However that's not necessary and it should be
easier and cleaner to do it unconditionally in mfill_atomic_install_pte().

The most recent discussion about this is here, where Hugh explained the history
of SetPageDirty() and why it's possible that it's not required at all:

https://lore.kernel.org/lkml/alpine.LSU.2.11.2104121657050.1097@eggly.anvils/

Currently mfill_atomic_install_pte() has three callers:

        1. shmem_mfill_atomic_pte
        2. mcopy_atomic_pte
        3. mcontinue_atomic_pte

After the change: case (1) should have its SetPageDirty replaced by the dirty
bit on pte (so we unify them together, finally), case (2) should have no
functional change at all as it has page_in_cache==false, case (3) may add a
dirty bit to the pte.  However since case (3) is UFFDIO_CONTINUE for shmem,
it's merely 100% sure the page is dirty after all, so should not make a real
difference either.

This should make it much easier to follow on which case will set dirty for
uffd, as we'll simply set it all now for all uffd related ioctls.  Meanwhile,
no special handling of SetPageDirty() if there's no need.

Cc: Hugh Dickins <hughd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/shmem.c       | 1 -
 mm/userfaultfd.c | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 88742953532c..96ccf6e941aa 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2424,7 +2424,6 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-	SetPageDirty(page);
 	unlock_page(page);
 	return 0;
 out_delete_from_cache:
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 7a9008415534..caf6dfff2a60 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -69,10 +69,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	pgoff_t offset, max_off;
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+	_dst_pte = pte_mkdirty(_dst_pte);
 	if (page_in_cache && !vm_shared)
 		writable = false;
-	if (writable || !page_in_cache)
-		_dst_pte = pte_mkdirty(_dst_pte);
 	if (writable) {
 		if (wp_copy)
 			_dst_pte = pte_mkuffd_wp(_dst_pte);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/5] mm: Clear vmf->pte after pte_unmap_same() returns
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
  2021-09-08 16:35 ` [PATCH v3 1/5] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
@ 2021-09-08 16:36 ` Peter Xu
  2021-09-08 16:36 ` [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details Peter Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:36 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Alistair Popple, Matthew Wilcox, Kirill A . Shutemov,
	Jerome Glisse, peterx, Liam Howlett, Mike Rapoport

pte_unmap_same() will always unmap the pte pointer.  After the unmap, vmf->pte
will not be valid any more, we should clear it.

It was safe only because no one is accessing vmf->pte after pte_unmap_same()
returns, since the only caller of pte_unmap_same() (so far) is do_swap_page(),
where vmf->pte will in most cases be overwritten very soon.

Directly pass in vmf into pte_unmap_same() and then we can also avoid the long
parameter list too, which should be a nice cleanup.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 25fc46e87214..7b095f07c4ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2724,19 +2724,19 @@ EXPORT_SYMBOL_GPL(apply_to_existing_page_range);
  * proceeding (but do_wp_page is only called after already making such a check;
  * and do_anonymous_page can safely check later on).
  */
-static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
-				pte_t *page_table, pte_t orig_pte)
+static inline int pte_unmap_same(struct vm_fault *vmf)
 {
 	int same = 1;
 #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION)
 	if (sizeof(pte_t) > sizeof(unsigned long)) {
-		spinlock_t *ptl = pte_lockptr(mm, pmd);
+		spinlock_t *ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
 		spin_lock(ptl);
-		same = pte_same(*page_table, orig_pte);
+		same = pte_same(*vmf->pte, vmf->orig_pte);
 		spin_unlock(ptl);
 	}
 #endif
-	pte_unmap(page_table);
+	pte_unmap(vmf->pte);
+	vmf->pte = NULL;
 	return same;
 }
 
@@ -3487,7 +3487,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	vm_fault_t ret = 0;
 	void *shadow = NULL;
 
-	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
+	if (!pte_unmap_same(vmf))
 		goto out;
 
 	entry = pte_to_swp_entry(vmf->orig_pte);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
  2021-09-08 16:35 ` [PATCH v3 1/5] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
  2021-09-08 16:36 ` [PATCH v3 2/5] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
@ 2021-09-08 16:36 ` Peter Xu
  2021-09-09  2:54   ` Liam Howlett
  2021-09-08 16:36 ` [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper Peter Xu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:36 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Alistair Popple, Matthew Wilcox, Kirill A . Shutemov,
	Jerome Glisse, peterx, Liam Howlett, Mike Rapoport

The first_index/last_index parameters in zap_details are actually only used in
unmap_mapping_range_tree().  At the meantime, this function is only called by
unmap_mapping_pages() once.  Instead of passing these two variables through the
whole stack of page zapping code, remove them from zap_details and let them
simply be parameters of unmap_mapping_range_tree(), which is inlined.

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h |  2 --
 mm/memory.c        | 29 ++++++++++++++++-------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..d1126f731221 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1722,8 +1722,6 @@ extern void user_shm_unlock(size_t, struct ucounts *);
  */
 struct zap_details {
 	struct address_space *check_mapping;	/* Check page->mapping if set */
-	pgoff_t	first_index;			/* Lowest page->index to unmap */
-	pgoff_t last_index;			/* Highest page->index to unmap */
 	struct page *single_page;		/* Locked page to be unmapped */
 };
 
diff --git a/mm/memory.c b/mm/memory.c
index 7b095f07c4ef..6bba3b9fef7c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3321,20 +3321,20 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
 }
 
 static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
+					    pgoff_t first_index,
+					    pgoff_t last_index,
 					    struct zap_details *details)
 {
 	struct vm_area_struct *vma;
 	pgoff_t vba, vea, zba, zea;
 
-	vma_interval_tree_foreach(vma, root,
-			details->first_index, details->last_index) {
-
+	vma_interval_tree_foreach(vma, root, first_index, last_index) {
 		vba = vma->vm_pgoff;
 		vea = vba + vma_pages(vma) - 1;
-		zba = details->first_index;
+		zba = first_index;
 		if (zba < vba)
 			zba = vba;
-		zea = details->last_index;
+		zea = last_index;
 		if (zea > vea)
 			zea = vea;
 
@@ -3360,18 +3360,21 @@ void unmap_mapping_page(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
 	struct zap_details details = { };
+	pgoff_t	first_index, last_index;
 
 	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageTail(page));
 
+	first_index = page->index;
+	last_index = page->index + thp_nr_pages(page) - 1;
+
 	details.check_mapping = mapping;
-	details.first_index = page->index;
-	details.last_index = page->index + thp_nr_pages(page) - 1;
 	details.single_page = page;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
-		unmap_mapping_range_tree(&mapping->i_mmap, &details);
+		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+					 last_index, &details);
 	i_mmap_unlock_write(mapping);
 }
 
@@ -3390,17 +3393,17 @@ void unmap_mapping_page(struct page *page)
 void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 		pgoff_t nr, bool even_cows)
 {
+	pgoff_t	first_index = start, last_index = start + nr - 1;
 	struct zap_details details = { };
 
 	details.check_mapping = even_cows ? NULL : mapping;
-	details.first_index = start;
-	details.last_index = start + nr - 1;
-	if (details.last_index < details.first_index)
-		details.last_index = ULONG_MAX;
+	if (last_index < first_index)
+		last_index = ULONG_MAX;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
-		unmap_mapping_range_tree(&mapping->i_mmap, &details);
+		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
+					 last_index, &details);
 	i_mmap_unlock_write(mapping);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
                   ` (2 preceding siblings ...)
  2021-09-08 16:36 ` [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details Peter Xu
@ 2021-09-08 16:36 ` Peter Xu
  2021-09-09  1:16   ` Alistair Popple
  2021-09-08 16:36 ` [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags Peter Xu
  2021-09-14 16:37 ` [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
  5 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:36 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Alistair Popple, Matthew Wilcox, Kirill A . Shutemov,
	Jerome Glisse, peterx, Liam Howlett, Mike Rapoport

Use the helper for the checks.  Rename "check_mapping" into "zap_mapping"
because "check_mapping" looks like a bool but in fact it stores the mapping
itself.  When it's set, we check the mapping (it must be non-NULL).  When it's
cleared we skip the check, which works like the old way.

Move the duplicated comments to the helper too.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h | 16 +++++++++++++++-
 mm/memory.c        | 29 ++++++-----------------------
 2 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d1126f731221..ed44f31615d9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1721,10 +1721,24 @@ extern void user_shm_unlock(size_t, struct ucounts *);
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
-	struct address_space *check_mapping;	/* Check page->mapping if set */
+	struct address_space *zap_mapping;	/* Check page->mapping if set */
 	struct page *single_page;		/* Locked page to be unmapped */
 };
 
+/*
+ * We set details->zap_mappings when we want to unmap shared but keep private
+ * pages. Return true if skip zapping this page, false otherwise.
+ */
+static inline bool
+zap_skip_check_mapping(struct zap_details *details, struct page *page)
+{
+	if (!details || !page)
+		return false;
+
+	return details->zap_mapping &&
+	    (details->zap_mapping != page_rmapping(page));
+}
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 6bba3b9fef7c..e5ee8399d270 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1333,16 +1333,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = vm_normal_page(vma, addr, ptent);
-			if (unlikely(details) && page) {
-				/*
-				 * unmap_shared_mapping_pages() wants to
-				 * invalidate cache without truncating:
-				 * unmap shared but keep private pages.
-				 */
-				if (details->check_mapping &&
-				    details->check_mapping != page_rmapping(page))
-					continue;
-			}
+			if (unlikely(zap_skip_check_mapping(details, page)))
+				continue;
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
 							tlb->fullmm);
 			tlb_remove_tlb_entry(tlb, pte, addr);
@@ -1375,17 +1367,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		    is_device_exclusive_entry(entry)) {
 			struct page *page = pfn_swap_entry_to_page(entry);
 
-			if (unlikely(details && details->check_mapping)) {
-				/*
-				 * unmap_shared_mapping_pages() wants to
-				 * invalidate cache without truncating:
-				 * unmap shared but keep private pages.
-				 */
-				if (details->check_mapping !=
-				    page_rmapping(page))
-					continue;
-			}
-
+			if (unlikely(zap_skip_check_mapping(details, page)))
+				continue;
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
 
@@ -3368,7 +3351,7 @@ void unmap_mapping_page(struct page *page)
 	first_index = page->index;
 	last_index = page->index + thp_nr_pages(page) - 1;
 
-	details.check_mapping = mapping;
+	details.zap_mapping = mapping;
 	details.single_page = page;
 
 	i_mmap_lock_write(mapping);
@@ -3396,7 +3379,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 	pgoff_t	first_index = start, last_index = start + nr - 1;
 	struct zap_details details = { };
 
-	details.check_mapping = even_cows ? NULL : mapping;
+	details.zap_mapping = even_cows ? NULL : mapping;
 	if (last_index < first_index)
 		last_index = ULONG_MAX;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
                   ` (3 preceding siblings ...)
  2021-09-08 16:36 ` [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper Peter Xu
@ 2021-09-08 16:36 ` Peter Xu
  2021-09-15  2:25   ` Alistair Popple
  2021-09-14 16:37 ` [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
  5 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-09-08 16:36 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Alistair Popple, Matthew Wilcox, Kirill A . Shutemov,
	Jerome Glisse, peterx, Liam Howlett, Mike Rapoport

Firstly, the comment in zap_pte_range() is misleading because it checks against
details rather than check_mappings, so it's against what the code did.

Meanwhile, there's no explicit reason why passing in the details pointer should
mean to skip all swap entries.  New user of zap_details could very possibly
miss this fact if they don't read deep until zap_pte_range() because there's no
comment at zap_details talking about it at all, so swap entries could be
erroneously skipped without being noticed.

This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"),
but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
"details" parameter: the caller should explicitly set this to skip swap
entries, otherwise swap entries will always be considered (which should still
be the major case here).

We may want to look into when exactly we need ZAP_FLAG_SKIP_SWAP and we should
have it in a synchronous manner, e.g., currently even if ZAP_FLAG_SKIP_SWAP is
set we'll still look into swap pmds no matter what.  But that should be a
separate effort of this patch.

The flag introduced in this patch will be a preparation for more bits defined
in the future, e.g., for a new bit in flag to show whether to persist the
upcoming uffd-wp bit in pgtable entries.

Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/mm.h | 16 ++++++++++++++++
 mm/memory.c        |  6 +++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed44f31615d9..beb784ce35b9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1717,12 +1717,18 @@ static inline bool can_do_mlock(void) { return false; }
 extern int user_shm_lock(size_t, struct ucounts *);
 extern void user_shm_unlock(size_t, struct ucounts *);
 
+typedef unsigned int __bitwise zap_flags_t;
+
+/* Whether to skip zapping swap entries */
+#define  ZAP_FLAG_SKIP_SWAP  ((__force zap_flags_t) BIT(0))
+
 /*
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
 	struct address_space *zap_mapping;	/* Check page->mapping if set */
 	struct page *single_page;		/* Locked page to be unmapped */
+	zap_flags_t zap_flags;			/* Extra flags for zapping */
 };
 
 /*
@@ -1739,6 +1745,16 @@ zap_skip_check_mapping(struct zap_details *details, struct page *page)
 	    (details->zap_mapping != page_rmapping(page));
 }
 
+/* Return true if skip swap entries, false otherwise */
+static inline bool
+zap_skip_swap(struct zap_details *details)
+{
+	if (!details)
+		return false;
+
+	return details->zap_flags & ZAP_FLAG_SKIP_SWAP;
+}
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 			     pte_t pte);
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index e5ee8399d270..26e37bef1888 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1379,8 +1379,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			continue;
 		}
 
-		/* If details->check_mapping, we leave swap entries. */
-		if (unlikely(details))
+		if (unlikely(zap_skip_swap(details)))
 			continue;
 
 		if (!non_swap_entry(entry))
@@ -3353,6 +3352,7 @@ void unmap_mapping_page(struct page *page)
 
 	details.zap_mapping = mapping;
 	details.single_page = page;
+	details.zap_flags = ZAP_FLAG_SKIP_SWAP;
 
 	i_mmap_lock_write(mapping);
 	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
@@ -3377,7 +3377,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 		pgoff_t nr, bool even_cows)
 {
 	pgoff_t	first_index = start, last_index = start + nr - 1;
-	struct zap_details details = { };
+	struct zap_details details = { .zap_flags = ZAP_FLAG_SKIP_SWAP };
 
 	details.zap_mapping = even_cows ? NULL : mapping;
 	if (last_index < first_index)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper
  2021-09-08 16:36 ` [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper Peter Xu
@ 2021-09-09  1:16   ` Alistair Popple
  0 siblings, 0 replies; 14+ messages in thread
From: Alistair Popple @ 2021-09-09  1:16 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Peter Xu
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Matthew Wilcox, Kirill A . Shutemov, Jerome Glisse, peterx,
	Liam Howlett, Mike Rapoport

Reviewed-by: Alistair Popple <apopple@nvidia.com>

On Thursday, 9 September 2021 2:36:25 AM AEST Peter Xu wrote:
> Use the helper for the checks.  Rename "check_mapping" into "zap_mapping"
> because "check_mapping" looks like a bool but in fact it stores the mapping
> itself.  When it's set, we check the mapping (it must be non-NULL).  When it's
> cleared we skip the check, which works like the old way.
> 
> Move the duplicated comments to the helper too.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/mm.h | 16 +++++++++++++++-
>  mm/memory.c        | 29 ++++++-----------------------
>  2 files changed, 21 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d1126f731221..ed44f31615d9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1721,10 +1721,24 @@ extern void user_shm_unlock(size_t, struct ucounts *);
>   * Parameter block passed down to zap_pte_range in exceptional cases.
>   */
>  struct zap_details {
> -	struct address_space *check_mapping;	/* Check page->mapping if set */
> +	struct address_space *zap_mapping;	/* Check page->mapping if set */
>  	struct page *single_page;		/* Locked page to be unmapped */
>  };
>  
> +/*
> + * We set details->zap_mappings when we want to unmap shared but keep private
> + * pages. Return true if skip zapping this page, false otherwise.
> + */
> +static inline bool
> +zap_skip_check_mapping(struct zap_details *details, struct page *page)
> +{
> +	if (!details || !page)
> +		return false;
> +
> +	return details->zap_mapping &&
> +	    (details->zap_mapping != page_rmapping(page));
> +}
> +
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t pte);
>  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index 6bba3b9fef7c..e5ee8399d270 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1333,16 +1333,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			struct page *page;
>  
>  			page = vm_normal_page(vma, addr, ptent);
> -			if (unlikely(details) && page) {
> -				/*
> -				 * unmap_shared_mapping_pages() wants to
> -				 * invalidate cache without truncating:
> -				 * unmap shared but keep private pages.
> -				 */
> -				if (details->check_mapping &&
> -				    details->check_mapping != page_rmapping(page))
> -					continue;
> -			}
> +			if (unlikely(zap_skip_check_mapping(details, page)))
> +				continue;
>  			ptent = ptep_get_and_clear_full(mm, addr, pte,
>  							tlb->fullmm);
>  			tlb_remove_tlb_entry(tlb, pte, addr);
> @@ -1375,17 +1367,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  		    is_device_exclusive_entry(entry)) {
>  			struct page *page = pfn_swap_entry_to_page(entry);
>  
> -			if (unlikely(details && details->check_mapping)) {
> -				/*
> -				 * unmap_shared_mapping_pages() wants to
> -				 * invalidate cache without truncating:
> -				 * unmap shared but keep private pages.
> -				 */
> -				if (details->check_mapping !=
> -				    page_rmapping(page))
> -					continue;
> -			}
> -
> +			if (unlikely(zap_skip_check_mapping(details, page)))
> +				continue;
>  			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>  			rss[mm_counter(page)]--;
>  
> @@ -3368,7 +3351,7 @@ void unmap_mapping_page(struct page *page)
>  	first_index = page->index;
>  	last_index = page->index + thp_nr_pages(page) - 1;
>  
> -	details.check_mapping = mapping;
> +	details.zap_mapping = mapping;
>  	details.single_page = page;
>  
>  	i_mmap_lock_write(mapping);
> @@ -3396,7 +3379,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
>  	pgoff_t	first_index = start, last_index = start + nr - 1;
>  	struct zap_details details = { };
>  
> -	details.check_mapping = even_cows ? NULL : mapping;
> +	details.zap_mapping = even_cows ? NULL : mapping;
>  	if (last_index < first_index)
>  		last_index = ULONG_MAX;
>  
> 





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details
  2021-09-08 16:36 ` [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details Peter Xu
@ 2021-09-09  2:54   ` Liam Howlett
  2021-09-09 18:13     ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Liam Howlett @ 2021-09-09  2:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Miaohe Lin,
	David Hildenbrand, Andrea Arcangeli, Yang Shi, Alistair Popple,
	Matthew Wilcox, Kirill A . Shutemov, Jerome Glisse,
	Mike Rapoport

* Peter Xu <peterx@redhat.com> [210908 12:36]:
> The first_index/last_index parameters in zap_details are actually only used in
> unmap_mapping_range_tree().  At the meantime, this function is only called by
> unmap_mapping_pages() once.  Instead of passing these two variables through the
> whole stack of page zapping code, remove them from zap_details and let them
> simply be parameters of unmap_mapping_range_tree(), which is inlined.
> 
> Reviewed-by: Alistair Popple <apopple@nvidia.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/mm.h |  2 --
>  mm/memory.c        | 29 ++++++++++++++++-------------
>  2 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..d1126f731221 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1722,8 +1722,6 @@ extern void user_shm_unlock(size_t, struct ucounts *);
>   */
>  struct zap_details {
>  	struct address_space *check_mapping;	/* Check page->mapping if set */
> -	pgoff_t	first_index;			/* Lowest page->index to unmap */
> -	pgoff_t last_index;			/* Highest page->index to unmap */
>  	struct page *single_page;		/* Locked page to be unmapped */
>  };
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 7b095f07c4ef..6bba3b9fef7c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3321,20 +3321,20 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
>  }
>  
>  static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
> +					    pgoff_t first_index,
> +					    pgoff_t last_index,
>  					    struct zap_details *details)
>  {
>  	struct vm_area_struct *vma;
>  	pgoff_t vba, vea, zba, zea;
>  
> -	vma_interval_tree_foreach(vma, root,
> -			details->first_index, details->last_index) {
> -
> +	vma_interval_tree_foreach(vma, root, first_index, last_index) {
>  		vba = vma->vm_pgoff;
>  		vea = vba + vma_pages(vma) - 1;
> -		zba = details->first_index;
> +		zba = first_index;
>  		if (zba < vba)
>  			zba = vba;
> -		zea = details->last_index;
> +		zea = last_index;
>  		if (zea > vea)
>  			zea = vea;
>  
> @@ -3360,18 +3360,21 @@ void unmap_mapping_page(struct page *page)
>  {
>  	struct address_space *mapping = page->mapping;
>  	struct zap_details details = { };
> +	pgoff_t	first_index, last_index;
>  
>  	VM_BUG_ON(!PageLocked(page));
>  	VM_BUG_ON(PageTail(page));
>  
> +	first_index = page->index;
> +	last_index = page->index + thp_nr_pages(page) - 1;
> +
>  	details.check_mapping = mapping;
> -	details.first_index = page->index;
> -	details.last_index = page->index + thp_nr_pages(page) - 1;
>  	details.single_page = page;
>  
>  	i_mmap_lock_write(mapping);
>  	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
> -		unmap_mapping_range_tree(&mapping->i_mmap, &details);
> +		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
> +					 last_index, &details);
>  	i_mmap_unlock_write(mapping);
>  }
>  
> @@ -3390,17 +3393,17 @@ void unmap_mapping_page(struct page *page)
>  void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
>  		pgoff_t nr, bool even_cows)
>  {
> +	pgoff_t	first_index = start, last_index = start + nr - 1;

Nit: If you respin, can first_index and last_index be two lines please?

>  	struct zap_details details = { };
>  
>  	details.check_mapping = even_cows ? NULL : mapping;
> -	details.first_index = start;
> -	details.last_index = start + nr - 1;
> -	if (details.last_index < details.first_index)
> -		details.last_index = ULONG_MAX;

Nit: Maybe throw a comment about this being overflow check, if you
respin.

> +	if (last_index < first_index)
> +		last_index = ULONG_MAX;
>  
>  	i_mmap_lock_write(mapping);
>  	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
> -		unmap_mapping_range_tree(&mapping->i_mmap, &details);
> +		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
> +					 last_index, &details);
>  	i_mmap_unlock_write(mapping);
>  }
>  
> -- 
> 2.31.1
> 

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details
  2021-09-09  2:54   ` Liam Howlett
@ 2021-09-09 18:13     ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-09 18:13 UTC (permalink / raw)
  To: Liam Howlett
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Miaohe Lin,
	David Hildenbrand, Andrea Arcangeli, Yang Shi, Alistair Popple,
	Matthew Wilcox, Kirill A . Shutemov, Jerome Glisse,
	Mike Rapoport

On Thu, Sep 09, 2021 at 02:54:37AM +0000, Liam Howlett wrote:
> > @@ -3390,17 +3393,17 @@ void unmap_mapping_page(struct page *page)
> >  void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
> >  		pgoff_t nr, bool even_cows)
> >  {
> > +	pgoff_t	first_index = start, last_index = start + nr - 1;
> 
> Nit: If you respin, can first_index and last_index be two lines please?

Sure.

> 
> >  	struct zap_details details = { };
> >  
> >  	details.check_mapping = even_cows ? NULL : mapping;
> > -	details.first_index = start;
> > -	details.last_index = start + nr - 1;
> > -	if (details.last_index < details.first_index)
> > -		details.last_index = ULONG_MAX;
> 
> Nit: Maybe throw a comment about this being overflow check, if you
> respin.

It may not be "only" an overflow check, e.g., both unmap_mapping_range() and
unmap_mapping_pages() allows taking the npages to be zero:

For unmap_mapping_range:

 * @holelen: size of prospective hole in bytes.  This will be rounded
 * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
 * end of the file.

For unmap_mapping_pages:

 * @nr: Number of pages to be unmapped.  0 to unmap to end of file.

So we must set it to ULONG_MAX to make sure nr==0 will work like that.

I won't bother adding a comment, but if to add it I'll probably also mention
about that part on allowing a nr==0 use case, please let me know if you insist.

> 
> > +	if (last_index < first_index)
> > +		last_index = ULONG_MAX;
> >  
> >  	i_mmap_lock_write(mapping);
> >  	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
> > -		unmap_mapping_range_tree(&mapping->i_mmap, &details);
> > +		unmap_mapping_range_tree(&mapping->i_mmap, first_index,
> > +					 last_index, &details);
> >  	i_mmap_unlock_write(mapping);
> >  }
> >  
> > -- 
> > 2.31.1
> > 
> 
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

Thanks for reviewing.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd
  2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
                   ` (4 preceding siblings ...)
  2021-09-08 16:36 ` [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags Peter Xu
@ 2021-09-14 16:37 ` Peter Xu
  5 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-14 16:37 UTC (permalink / raw)
  To: linux-kernel, linux-mm, Hugh Dickins, Andrew Morton
  Cc: Yang Shi, Miaohe Lin, Alistair Popple, Matthew Wilcox,
	David Hildenbrand, Jerome Glisse, Kirill A . Shutemov,
	Andrea Arcangeli, Liam Howlett, Mike Rapoport

On Wed, Sep 08, 2021 at 12:35:11PM -0400, Peter Xu wrote:
> [Based on linus/master, commit ac08b1c68d1b which should contain the recent -mm
>  pull, if not applicable, I can repost anytime]
> 
> v3:
> - Patch "mm: Add zap_skip_check_mapping() helper"
>   - In zap_skip_check_mapping() check zap_mapping first [Alistair]
> - Patch "mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags":
>   - Fix English errors in commit message [David]
>   - Drop paragraph mentioning commit 22061a1ffabd in commit msg
>   - Set ZAP_FLAG_SKIP_SWAP for unmap_mapping_page() too
> - Pick up r-bs

Ping - patches 1-4 have at least 1 ack already, the last patch is missing ack.

The change is quite straightforward as right now we apply SKIP_SWAP to all
"details" users of zapping process, so no functional change intended.  It'll be
important for uffd-wp in the future because uffd-wp will start to use "details"
without setting SKIP_SWAP.

It would be great if anyone could help have a look at the last patch, thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags
  2021-09-08 16:36 ` [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags Peter Xu
@ 2021-09-15  2:25   ` Alistair Popple
  2021-09-15  2:52     ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Alistair Popple @ 2021-09-15  2:25 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Peter Xu
  Cc: Miaohe Lin, David Hildenbrand, Andrea Arcangeli, Yang Shi,
	Matthew Wilcox, Kirill A . Shutemov, Jerome Glisse, peterx,
	Liam Howlett, Mike Rapoport

On Thursday, 9 September 2021 2:36:28 AM AEST Peter Xu wrote:
> Firstly, the comment in zap_pte_range() is misleading because it checks against
> details rather than check_mappings, so it's against what the code did.
> 
> Meanwhile, there's no explicit reason why passing in the details pointer should
> mean to skip all swap entries.  New user of zap_details could very possibly
> miss this fact if they don't read deep until zap_pte_range() because there's no
> comment at zap_details talking about it at all, so swap entries could be
> erroneously skipped without being noticed.
> 
> This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"),
> but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
> "details" parameter: the caller should explicitly set this to skip swap
> entries, otherwise swap entries will always be considered (which should still
> be the major case here).
> 
> We may want to look into when exactly we need ZAP_FLAG_SKIP_SWAP and we should
> have it in a synchronous manner, e.g., currently even if ZAP_FLAG_SKIP_SWAP is
> set we'll still look into swap pmds no matter what.  But that should be a
> separate effort of this patch.

I didn't really follow what you mean by "synchronous" here, although the
explanation about pmds makes sense so it's probably just terminology.
 
> The flag introduced in this patch will be a preparation for more bits defined
> in the future, e.g., for a new bit in flag to show whether to persist the
> upcoming uffd-wp bit in pgtable entries.

That's kind of the problem. The patch itself looks correct to me however as
mentioned it is mostly reverting a previous cleanup and it's hard to tell why
that's justified without the subsequent patches. Perhaps it makes the usage of
zap_details a bit clearer, but a comment also would with less code.

I know you want to try and shrink the uffd-wp series but I think this patch
might be easier to review if it was included as part of that series.

> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/mm.h | 16 ++++++++++++++++
>  mm/memory.c        |  6 +++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ed44f31615d9..beb784ce35b9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1717,12 +1717,18 @@ static inline bool can_do_mlock(void) { return false; }
>  extern int user_shm_lock(size_t, struct ucounts *);
>  extern void user_shm_unlock(size_t, struct ucounts *);
>  
> +typedef unsigned int __bitwise zap_flags_t;
> +
> +/* Whether to skip zapping swap entries */
> +#define  ZAP_FLAG_SKIP_SWAP  ((__force zap_flags_t) BIT(0))
> +
>  /*
>   * Parameter block passed down to zap_pte_range in exceptional cases.
>   */
>  struct zap_details {
>  	struct address_space *zap_mapping;	/* Check page->mapping if set */
>  	struct page *single_page;		/* Locked page to be unmapped */
> +	zap_flags_t zap_flags;			/* Extra flags for zapping */
>  };
>  
>  /*
> @@ -1739,6 +1745,16 @@ zap_skip_check_mapping(struct zap_details *details, struct page *page)
>  	    (details->zap_mapping != page_rmapping(page));
>  }
>  
> +/* Return true if skip swap entries, false otherwise */
> +static inline bool
> +zap_skip_swap(struct zap_details *details)
> +{
> +	if (!details)
> +		return false;
> +
> +	return details->zap_flags & ZAP_FLAG_SKIP_SWAP;
> +}
> +
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  			     pte_t pte);
>  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index e5ee8399d270..26e37bef1888 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1379,8 +1379,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			continue;
>  		}
>  
> -		/* If details->check_mapping, we leave swap entries. */
> -		if (unlikely(details))
> +		if (unlikely(zap_skip_swap(details)))
>  			continue;
>  
>  		if (!non_swap_entry(entry))
> @@ -3353,6 +3352,7 @@ void unmap_mapping_page(struct page *page)
>  
>  	details.zap_mapping = mapping;
>  	details.single_page = page;
> +	details.zap_flags = ZAP_FLAG_SKIP_SWAP;
>  
>  	i_mmap_lock_write(mapping);
>  	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
> @@ -3377,7 +3377,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
>  		pgoff_t nr, bool even_cows)
>  {
>  	pgoff_t	first_index = start, last_index = start + nr - 1;
> -	struct zap_details details = { };
> +	struct zap_details details = { .zap_flags = ZAP_FLAG_SKIP_SWAP };
>  
>  	details.zap_mapping = even_cows ? NULL : mapping;
>  	if (last_index < first_index)
> 





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags
  2021-09-15  2:25   ` Alistair Popple
@ 2021-09-15  2:52     ` Peter Xu
  2021-09-15  3:21       ` Alistair Popple
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2021-09-15  2:52 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Miaohe Lin,
	David Hildenbrand, Andrea Arcangeli, Yang Shi, Matthew Wilcox,
	Kirill A . Shutemov, Jerome Glisse, Liam Howlett, Mike Rapoport

Hi, Alistair,

On Wed, Sep 15, 2021 at 12:25:07PM +1000, Alistair Popple wrote:
> On Thursday, 9 September 2021 2:36:28 AM AEST Peter Xu wrote:
> > Firstly, the comment in zap_pte_range() is misleading because it checks against
> > details rather than check_mappings, so it's against what the code did.
> > 
> > Meanwhile, there's no explicit reason why passing in the details pointer should
> > mean to skip all swap entries.  New user of zap_details could very possibly
> > miss this fact if they don't read deep until zap_pte_range() because there's no
> > comment at zap_details talking about it at all, so swap entries could be
> > erroneously skipped without being noticed.
> > 
> > This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"),
> > but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
> > "details" parameter: the caller should explicitly set this to skip swap
> > entries, otherwise swap entries will always be considered (which should still
> > be the major case here).
> > 
> > We may want to look into when exactly we need ZAP_FLAG_SKIP_SWAP and we should
> > have it in a synchronous manner, e.g., currently even if ZAP_FLAG_SKIP_SWAP is
> > set we'll still look into swap pmds no matter what.  But that should be a
> > separate effort of this patch.
> 
> I didn't really follow what you mean by "synchronous" here, although the
> explanation about pmds makes sense so it's probably just terminology.

Yes, maybe I should use "aligned manner", or please suggest anything that
sounds better; sorry for my awkward English.

>  
> > The flag introduced in this patch will be a preparation for more bits defined
> > in the future, e.g., for a new bit in flag to show whether to persist the
> > upcoming uffd-wp bit in pgtable entries.
> 
> That's kind of the problem. The patch itself looks correct to me however as
> mentioned it is mostly reverting a previous cleanup and it's hard to tell why
> that's justified without the subsequent patches. Perhaps it makes the usage of
> zap_details a bit clearer, but a comment also would with less code.
> 
> I know you want to try and shrink the uffd-wp series but I think this patch
> might be easier to review if it was included as part of that series.

I posted it because I think it's suitable to have it even without uffd-wp.

I tried to explain it above on two things this patch wanted to fix:

Firstly the comment is wrong; we've moved back and forth on changing the
zap_details flags but the comment is not changing along the way and it's not
matching the code right now.

Secondly I do think we should have a flag showing explicit willingness to skip
swap entries.  Yes, uffd-wp is the planned new one, but my point is anyone who
will introduce a new user of zap_details pointer could overlook this fact.  The
new flag helps us to make sure someone will at least read the flags and know
what'll happen with it.

For the 2nd reasoning, I also explicitly CCed Kirill too, so Kirill can provide
any comment if he disagrees.  For now, I still think we should keep having such
a flag otherwise it could be error-prone.

Could you buy-in above reasoning?

Basically above is what I wanted to express in my commit message.  I hope that
can justify that this patch (even if extremly simple) can still be considered
as acceptable upstream even without uffd-wp series.

If you still insist on this patch not suitable for standalone merging and
especially if some other reviewer would think the same, I can move it back to
uffd-wp series for sure.  Then I'll repost this series with 4 patches only.

In all cases, thanks for looking at the series.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags
  2021-09-15  2:52     ` Peter Xu
@ 2021-09-15  3:21       ` Alistair Popple
  2021-09-15  4:01         ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Alistair Popple @ 2021-09-15  3:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Miaohe Lin,
	David Hildenbrand, Andrea Arcangeli, Yang Shi, Matthew Wilcox,
	Kirill A . Shutemov, Jerome Glisse, Liam Howlett, Mike Rapoport

On Wednesday, 15 September 2021 12:52:48 PM AEST Peter Xu wrote:

> > > The flag introduced in this patch will be a preparation for more bits defined
> > > in the future, e.g., for a new bit in flag to show whether to persist the
> > > upcoming uffd-wp bit in pgtable entries.
> > 
> > That's kind of the problem. The patch itself looks correct to me however as
> > mentioned it is mostly reverting a previous cleanup and it's hard to tell why
> > that's justified without the subsequent patches. Perhaps it makes the usage of
> > zap_details a bit clearer, but a comment also would with less code.
> > 
> > I know you want to try and shrink the uffd-wp series but I think this patch
> > might be easier to review if it was included as part of that series.
> 
> I posted it because I think it's suitable to have it even without uffd-wp.
> 
> I tried to explain it above on two things this patch wanted to fix:
> 
> Firstly the comment is wrong; we've moved back and forth on changing the
> zap_details flags but the comment is not changing along the way and it's not
> matching the code right now.
> 
> Secondly I do think we should have a flag showing explicit willingness to skip
> swap entries.  Yes, uffd-wp is the planned new one, but my point is anyone who
> will introduce a new user of zap_details pointer could overlook this fact.  The
> new flag helps us to make sure someone will at least read the flags and know
> what'll happen with it.
> 
> For the 2nd reasoning, I also explicitly CCed Kirill too, so Kirill can provide
> any comment if he disagrees.  For now, I still think we should keep having such
> a flag otherwise it could be error-prone.
> 
> Could you buy-in above reasoning?

Kind of, I do think it makes the usage of details a bit clearer or at least
harder to miss. It is just that if that was the sole aim of this patch I think
there might be simpler (less code) ways of doing so.

> Basically above is what I wanted to express in my commit message.  I hope that
> can justify that this patch (even if extremly simple) can still be considered
> as acceptable upstream even without uffd-wp series.
> 
> If you still insist on this patch not suitable for standalone merging and
> especially if some other reviewer would think the same, I can move it back to
> uffd-wp series for sure.  Then I'll repost this series with 4 patches only.

I won't insist, the code looks correct and it doesn't make things any less
clear so you can put my Reviewed-by on it and perhaps leave it to Andrew or
another reviewer to determine if this should be taken in this series or as part
of a future uffd-wp series.

 - Alistair

> In all cases, thanks for looking at the series.
> 
> 





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags
  2021-09-15  3:21       ` Alistair Popple
@ 2021-09-15  4:01         ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2021-09-15  4:01 UTC (permalink / raw)
  To: Alistair Popple
  Cc: linux-kernel, Andrew Morton, Hugh Dickins, linux-mm, Miaohe Lin,
	David Hildenbrand, Andrea Arcangeli, Yang Shi, Matthew Wilcox,
	Kirill A . Shutemov, Jerome Glisse, Liam Howlett, Mike Rapoport

On Wed, Sep 15, 2021 at 01:21:30PM +1000, Alistair Popple wrote:
> On Wednesday, 15 September 2021 12:52:48 PM AEST Peter Xu wrote:
> 
> > > > The flag introduced in this patch will be a preparation for more bits defined
> > > > in the future, e.g., for a new bit in flag to show whether to persist the
> > > > upcoming uffd-wp bit in pgtable entries.
> > > 
> > > That's kind of the problem. The patch itself looks correct to me however as
> > > mentioned it is mostly reverting a previous cleanup and it's hard to tell why
> > > that's justified without the subsequent patches. Perhaps it makes the usage of
> > > zap_details a bit clearer, but a comment also would with less code.
> > > 
> > > I know you want to try and shrink the uffd-wp series but I think this patch
> > > might be easier to review if it was included as part of that series.
> > 
> > I posted it because I think it's suitable to have it even without uffd-wp.
> > 
> > I tried to explain it above on two things this patch wanted to fix:
> > 
> > Firstly the comment is wrong; we've moved back and forth on changing the
> > zap_details flags but the comment is not changing along the way and it's not
> > matching the code right now.
> > 
> > Secondly I do think we should have a flag showing explicit willingness to skip
> > swap entries.  Yes, uffd-wp is the planned new one, but my point is anyone who
> > will introduce a new user of zap_details pointer could overlook this fact.  The
> > new flag helps us to make sure someone will at least read the flags and know
> > what'll happen with it.
> > 
> > For the 2nd reasoning, I also explicitly CCed Kirill too, so Kirill can provide
> > any comment if he disagrees.  For now, I still think we should keep having such
> > a flag otherwise it could be error-prone.
> > 
> > Could you buy-in above reasoning?
> 
> Kind of, I do think it makes the usage of details a bit clearer or at least
> harder to miss. It is just that if that was the sole aim of this patch I think
> there might be simpler (less code) ways of doing so.

Yes you're right, we can add a big enough comment above zap_details to state
that, but then it'll be reverted when adding the uffd-wp flag in the other
series, because uffd-wp will still needs a way to specify !SKIP_SWAP and
KEEP_UFFD_WP.  Then it'll make the "series split" make less sense as you said.

I split the series only because I hope it could ease the reviewers, and also
that's probably the only thing I can do now to still try to smooth the process
of having a complete uffd-wp finally got proper reviewed and merged.

> 
> > Basically above is what I wanted to express in my commit message.  I hope that
> > can justify that this patch (even if extremly simple) can still be considered
> > as acceptable upstream even without uffd-wp series.
> > 
> > If you still insist on this patch not suitable for standalone merging and
> > especially if some other reviewer would think the same, I can move it back to
> > uffd-wp series for sure.  Then I'll repost this series with 4 patches only.
> 
> I won't insist, the code looks correct and it doesn't make things any less
> clear so you can put my Reviewed-by on it and perhaps leave it to Andrew or
> another reviewer to determine if this should be taken in this series or as part
> of a future uffd-wp series.

Will do; thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-15  4:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 16:35 [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu
2021-09-08 16:35 ` [PATCH v3 1/5] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte Peter Xu
2021-09-08 16:36 ` [PATCH v3 2/5] mm: Clear vmf->pte after pte_unmap_same() returns Peter Xu
2021-09-08 16:36 ` [PATCH v3 3/5] mm: Drop first_index/last_index in zap_details Peter Xu
2021-09-09  2:54   ` Liam Howlett
2021-09-09 18:13     ` Peter Xu
2021-09-08 16:36 ` [PATCH v3 4/5] mm: Add zap_skip_check_mapping() helper Peter Xu
2021-09-09  1:16   ` Alistair Popple
2021-09-08 16:36 ` [PATCH v3 5/5] mm: Add ZAP_FLAG_SKIP_SWAP and zap_flags Peter Xu
2021-09-15  2:25   ` Alistair Popple
2021-09-15  2:52     ` Peter Xu
2021-09-15  3:21       ` Alistair Popple
2021-09-15  4:01         ` Peter Xu
2021-09-14 16:37 ` [PATCH v3 0/5] mm: A few cleanup patches around zap, shmem and uffd Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.