All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-16 19:20 ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-16 19:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390, netdev
  Cc: David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton,
	Mike Kravetz

zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas as calls
should be vma specific.

Instead of fixing zap_page_range, change all callers to use the new
routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
zap_page_range_single passing in NULL zap details.  The name is also
more in line with other exported routines that operate within a vma.
We can then remove zap_page_range.

Also, change madvise_dontneed_single_vma to use this new routine.

[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/arm64/kernel/vdso.c                |  4 ++--
 arch/powerpc/kernel/vdso.c              |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c |  2 +-
 arch/powerpc/platforms/pseries/vas.c    |  2 +-
 arch/riscv/kernel/vdso.c                |  4 ++--
 arch/s390/kernel/vdso.c                 |  2 +-
 arch/s390/mm/gmap.c                     |  2 +-
 arch/x86/entry/vdso/vma.c               |  2 +-
 drivers/android/binder_alloc.c          |  2 +-
 include/linux/mm.h                      |  7 ++++--
 mm/madvise.c                            |  4 ++--
 mm/memory.c                             | 30 -------------------------
 mm/page-writeback.c                     |  2 +-
 net/ipv4/tcp.c                          |  6 ++---
 14 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e59a32aa0c49..a7b10e182f78 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT_VDSO
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 507f8228f983..479d70fe8c55 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_spec))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
index eb5bed333750..8f57388b760b 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
 	/*
 	 * When the LPAR lost credits due to core removal or during
 	 * migration, invalidate the existing mapping for the current
-	 * paste addresses and set windows in-active (zap_page_range in
+	 * paste addresses and set windows in-active (zap_vma_page_range in
 	 * reconfig_close_windows()).
 	 * New mapping will be done later after migration or new credits
 	 * available. So continue to receive faults if the user space
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 4ad6e510d405..2aef8d9295a2 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
 		 * is done before the original mmap() and after the ioctl.
 		 */
 		if (vma)
-			zap_page_range(vma, vma->vm_start,
+			zap_vma_page_range(vma, vma->vm_start,
 					vma->vm_end - vma->vm_start);
 
 		mmap_write_unlock(task_ref->mm);
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index e410275918ac..a405119da2c0 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT
 		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index ff7bf4432229..eccfcd505403 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 
 		if (!vma_is_special_mapping(vma, &vvar_mapping))
 			continue;
-		zap_page_range(vma, vma->vm_start, size);
+		zap_vma_page_range(vma, vma->vm_start, size);
 		break;
 	}
 	mmap_read_unlock(mm);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 74e1d873dce0..67d998152142 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 		if (is_vm_hugetlb_page(vma))
 			continue;
 		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
-		zap_page_range(vma, vmaddr, size);
+		zap_vma_page_range(vma, vmaddr, size);
 	}
 	mmap_read_unlock(gmap->mm);
 }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f3f9b9e53c..5aafbd19e869 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_mapping))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 4ad42b0f75cd..f7f10248c742 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	if (vma) {
 		trace_binder_unmap_user_start(alloc, index);
 
-		zap_page_range(vma, page_addr, PAGE_SIZE);
+		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
 
 		trace_binder_unmap_user_end(alloc, index);
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b28eb9c6ea2..706efaf95783 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		  unsigned long size);
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		    unsigned long size);
 void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
 			   unsigned long size, struct zap_details *details);
+static inline void zap_vma_page_range(struct vm_area_struct *vma,
+				 unsigned long address, unsigned long size)
+{
+	zap_page_range_single(vma, address, size, NULL);
+}
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 		struct vm_area_struct *start_vma, unsigned long start,
 		unsigned long end);
diff --git a/mm/madvise.c b/mm/madvise.c
index 87703a19bbef..3c4d9829d4e1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
  * Application no longer needs these pages.  If the pages are dirty,
  * it's OK to just throw them away.  The app will be more careful about
  * data it wants to keep.  Be sure to free swap resources too.  The
- * zap_page_range_single call sets things up for shrink_active_list to actually
+ * zap_vma_page_range call sets things up for shrink_active_list to actually
  * free these pages later if no one else has touched them in the meantime,
  * although we could add these pages to a global reuse list for
  * shrink_active_list to pick up before reclaiming other pages.
@@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 					unsigned long start, unsigned long end)
 {
-	zap_page_range_single(vma, start, end - start, NULL);
+	zap_vma_page_range(vma, start, end - start);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 5b2c137dfb2a..e953a0108278 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 	mmu_notifier_invalidate_range_end(&range);
 }
 
-/**
- * zap_page_range - remove user pages in a given range
- * @vma: vm_area_struct holding the applicable pages
- * @start: starting address of pages to zap
- * @size: number of bytes to zap
- *
- * Caller must protect the VMA list
- */
-void zap_page_range(struct vm_area_struct *vma, unsigned long start,
-		unsigned long size)
-{
-	struct maple_tree *mt = &vma->vm_mm->mm_mt;
-	unsigned long end = start + size;
-	struct mmu_notifier_range range;
-	struct mmu_gather tlb;
-	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
-
-	lru_add_drain();
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-				start, start + size);
-	tlb_gather_mmu(&tlb, vma->vm_mm);
-	update_hiwater_rss(vma->vm_mm);
-	mmu_notifier_invalidate_range_start(&range);
-	do {
-		unmap_single_vma(&tlb, vma, start, range.end, NULL);
-	} while ((vma = mas_find(&mas, end - 1)) != NULL);
-	mmu_notifier_invalidate_range_end(&range);
-	tlb_finish_mmu(&tlb);
-}
-
 /**
  * zap_page_range_single - remove user pages in a given range
  * @vma: vm_area_struct holding the applicable pages
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ad608ef2a243..bd9fe6ff6557 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  *
  * The caller must hold lock_page_memcg().  Most callers have the folio
  * locked.  A few have the folio blocked from truncation through other
- * means (eg zap_page_range() has it mapped and is holding the page table
+ * means (eg zap_vma_page_range() has it mapped and is holding the page table
  * lock).  This can also be called from mark_buffer_dirty(), which I
  * cannot prove is always protected against truncate.
  */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c567d5e8053e..afaad3cfed00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
 				*length + /* Mapped or pending */
 				(pages_remaining * PAGE_SIZE); /* Failed map. */
-		zap_page_range(vma, *address, maybe_zap_len);
+		zap_vma_page_range(vma, *address, maybe_zap_len);
 		err = 0;
 	}
 
@@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		unsigned long leftover_pages = pages_remaining;
 		int bytes_mapped;
 
-		/* We called zap_page_range, try to reinsert. */
+		/* We called zap_vma_page_range, try to reinsert. */
 		err = vm_insert_pages(vma, *address,
 				      pending_pages,
 				      &pages_remaining);
@@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
 	if (total_bytes_to_map) {
 		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
-			zap_page_range(vma, address, total_bytes_to_map);
+			zap_vma_page_range(vma, address, total_bytes_to_map);
 		zc->length = total_bytes_to_map;
 		zc->recv_skip_hint = 0;
 	} else {
-- 
2.38.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-16 19:20 ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-16 19:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390, netdev
  Cc: David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton,
	Mike Kravetz

zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas as calls
should be vma specific.

Instead of fixing zap_page_range, change all callers to use the new
routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
zap_page_range_single passing in NULL zap details.  The name is also
more in line with other exported routines that operate within a vma.
We can then remove zap_page_range.

Also, change madvise_dontneed_single_vma to use this new routine.

[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/arm64/kernel/vdso.c                |  4 ++--
 arch/powerpc/kernel/vdso.c              |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c |  2 +-
 arch/powerpc/platforms/pseries/vas.c    |  2 +-
 arch/riscv/kernel/vdso.c                |  4 ++--
 arch/s390/kernel/vdso.c                 |  2 +-
 arch/s390/mm/gmap.c                     |  2 +-
 arch/x86/entry/vdso/vma.c               |  2 +-
 drivers/android/binder_alloc.c          |  2 +-
 include/linux/mm.h                      |  7 ++++--
 mm/madvise.c                            |  4 ++--
 mm/memory.c                             | 30 -------------------------
 mm/page-writeback.c                     |  2 +-
 net/ipv4/tcp.c                          |  6 ++---
 14 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e59a32aa0c49..a7b10e182f78 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT_VDSO
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 507f8228f983..479d70fe8c55 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_spec))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
index eb5bed333750..8f57388b760b 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
 	/*
 	 * When the LPAR lost credits due to core removal or during
 	 * migration, invalidate the existing mapping for the current
-	 * paste addresses and set windows in-active (zap_page_range in
+	 * paste addresses and set windows in-active (zap_vma_page_range in
 	 * reconfig_close_windows()).
 	 * New mapping will be done later after migration or new credits
 	 * available. So continue to receive faults if the user space
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 4ad6e510d405..2aef8d9295a2 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
 		 * is done before the original mmap() and after the ioctl.
 		 */
 		if (vma)
-			zap_page_range(vma, vma->vm_start,
+			zap_vma_page_range(vma, vma->vm_start,
 					vma->vm_end - vma->vm_start);
 
 		mmap_write_unlock(task_ref->mm);
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index e410275918ac..a405119da2c0 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT
 		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index ff7bf4432229..eccfcd505403 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 
 		if (!vma_is_special_mapping(vma, &vvar_mapping))
 			continue;
-		zap_page_range(vma, vma->vm_start, size);
+		zap_vma_page_range(vma, vma->vm_start, size);
 		break;
 	}
 	mmap_read_unlock(mm);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 74e1d873dce0..67d998152142 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 		if (is_vm_hugetlb_page(vma))
 			continue;
 		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
-		zap_page_range(vma, vmaddr, size);
+		zap_vma_page_range(vma, vmaddr, size);
 	}
 	mmap_read_unlock(gmap->mm);
 }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f3f9b9e53c..5aafbd19e869 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_mapping))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 4ad42b0f75cd..f7f10248c742 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	if (vma) {
 		trace_binder_unmap_user_start(alloc, index);
 
-		zap_page_range(vma, page_addr, PAGE_SIZE);
+		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
 
 		trace_binder_unmap_user_end(alloc, index);
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b28eb9c6ea2..706efaf95783 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		  unsigned long size);
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		    unsigned long size);
 void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
 			   unsigned long size, struct zap_details *details);
+static inline void zap_vma_page_range(struct vm_area_struct *vma,
+				 unsigned long address, unsigned long size)
+{
+	zap_page_range_single(vma, address, size, NULL);
+}
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 		struct vm_area_struct *start_vma, unsigned long start,
 		unsigned long end);
diff --git a/mm/madvise.c b/mm/madvise.c
index 87703a19bbef..3c4d9829d4e1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
  * Application no longer needs these pages.  If the pages are dirty,
  * it's OK to just throw them away.  The app will be more careful about
  * data it wants to keep.  Be sure to free swap resources too.  The
- * zap_page_range_single call sets things up for shrink_active_list to actually
+ * zap_vma_page_range call sets things up for shrink_active_list to actually
  * free these pages later if no one else has touched them in the meantime,
  * although we could add these pages to a global reuse list for
  * shrink_active_list to pick up before reclaiming other pages.
@@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 					unsigned long start, unsigned long end)
 {
-	zap_page_range_single(vma, start, end - start, NULL);
+	zap_vma_page_range(vma, start, end - start);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 5b2c137dfb2a..e953a0108278 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 	mmu_notifier_invalidate_range_end(&range);
 }
 
-/**
- * zap_page_range - remove user pages in a given range
- * @vma: vm_area_struct holding the applicable pages
- * @start: starting address of pages to zap
- * @size: number of bytes to zap
- *
- * Caller must protect the VMA list
- */
-void zap_page_range(struct vm_area_struct *vma, unsigned long start,
-		unsigned long size)
-{
-	struct maple_tree *mt = &vma->vm_mm->mm_mt;
-	unsigned long end = start + size;
-	struct mmu_notifier_range range;
-	struct mmu_gather tlb;
-	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
-
-	lru_add_drain();
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-				start, start + size);
-	tlb_gather_mmu(&tlb, vma->vm_mm);
-	update_hiwater_rss(vma->vm_mm);
-	mmu_notifier_invalidate_range_start(&range);
-	do {
-		unmap_single_vma(&tlb, vma, start, range.end, NULL);
-	} while ((vma = mas_find(&mas, end - 1)) != NULL);
-	mmu_notifier_invalidate_range_end(&range);
-	tlb_finish_mmu(&tlb);
-}
-
 /**
  * zap_page_range_single - remove user pages in a given range
  * @vma: vm_area_struct holding the applicable pages
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ad608ef2a243..bd9fe6ff6557 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  *
  * The caller must hold lock_page_memcg().  Most callers have the folio
  * locked.  A few have the folio blocked from truncation through other
- * means (eg zap_page_range() has it mapped and is holding the page table
+ * means (eg zap_vma_page_range() has it mapped and is holding the page table
  * lock).  This can also be called from mark_buffer_dirty(), which I
  * cannot prove is always protected against truncate.
  */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c567d5e8053e..afaad3cfed00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
 				*length + /* Mapped or pending */
 				(pages_remaining * PAGE_SIZE); /* Failed map. */
-		zap_page_range(vma, *address, maybe_zap_len);
+		zap_vma_page_range(vma, *address, maybe_zap_len);
 		err = 0;
 	}
 
@@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		unsigned long leftover_pages = pages_remaining;
 		int bytes_mapped;
 
-		/* We called zap_page_range, try to reinsert. */
+		/* We called zap_vma_page_range, try to reinsert. */
 		err = vm_insert_pages(vma, *address,
 				      pending_pages,
 				      &pages_remaining);
@@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
 	if (total_bytes_to_map) {
 		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
-			zap_page_range(vma, address, total_bytes_to_map);
+			zap_vma_page_range(vma, address, total_bytes_to_map);
 		zc->length = total_bytes_to_map;
 		zc->recv_skip_hint = 0;
 	} else {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-16 19:20 ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-16 19:20 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390, netdev
  Cc: Christian Brauner, Michal Hocko, David Hildenbrand, Will Deacon,
	Rik van Riel, Matthew Wilcox, Dave Hansen, Eric Dumazet,
	Palmer Dabbelt, Peter Xu, Andrew Morton, Christian Borntraeger,
	Nadav Amit, Vlastimil Babka, Mike Kravetz

zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas as calls
should be vma specific.

Instead of fixing zap_page_range, change all callers to use the new
routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
zap_page_range_single passing in NULL zap details.  The name is also
more in line with other exported routines that operate within a vma.
We can then remove zap_page_range.

Also, change madvise_dontneed_single_vma to use this new routine.

[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/arm64/kernel/vdso.c                |  4 ++--
 arch/powerpc/kernel/vdso.c              |  2 +-
 arch/powerpc/platforms/book3s/vas-api.c |  2 +-
 arch/powerpc/platforms/pseries/vas.c    |  2 +-
 arch/riscv/kernel/vdso.c                |  4 ++--
 arch/s390/kernel/vdso.c                 |  2 +-
 arch/s390/mm/gmap.c                     |  2 +-
 arch/x86/entry/vdso/vma.c               |  2 +-
 drivers/android/binder_alloc.c          |  2 +-
 include/linux/mm.h                      |  7 ++++--
 mm/madvise.c                            |  4 ++--
 mm/memory.c                             | 30 -------------------------
 mm/page-writeback.c                     |  2 +-
 net/ipv4/tcp.c                          |  6 ++---
 14 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index e59a32aa0c49..a7b10e182f78 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT_VDSO
 		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 507f8228f983..479d70fe8c55 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_spec))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
index eb5bed333750..8f57388b760b 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
 	/*
 	 * When the LPAR lost credits due to core removal or during
 	 * migration, invalidate the existing mapping for the current
-	 * paste addresses and set windows in-active (zap_page_range in
+	 * paste addresses and set windows in-active (zap_vma_page_range in
 	 * reconfig_close_windows()).
 	 * New mapping will be done later after migration or new credits
 	 * available. So continue to receive faults if the user space
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 4ad6e510d405..2aef8d9295a2 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
 		 * is done before the original mmap() and after the ioctl.
 		 */
 		if (vma)
-			zap_page_range(vma, vma->vm_start,
+			zap_vma_page_range(vma, vma->vm_start,
 					vma->vm_end - vma->vm_start);
 
 		mmap_write_unlock(task_ref->mm);
diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
index e410275918ac..a405119da2c0 100644
--- a/arch/riscv/kernel/vdso.c
+++ b/arch/riscv/kernel/vdso.c
@@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #ifdef CONFIG_COMPAT
 		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 #endif
 	}
 
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index ff7bf4432229..eccfcd505403 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 
 		if (!vma_is_special_mapping(vma, &vvar_mapping))
 			continue;
-		zap_page_range(vma, vma->vm_start, size);
+		zap_vma_page_range(vma, vma->vm_start, size);
 		break;
 	}
 	mmap_read_unlock(mm);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 74e1d873dce0..67d998152142 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
 		if (is_vm_hugetlb_page(vma))
 			continue;
 		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
-		zap_page_range(vma, vmaddr, size);
+		zap_vma_page_range(vma, vmaddr, size);
 	}
 	mmap_read_unlock(gmap->mm);
 }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b8f3f9b9e53c..5aafbd19e869 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
 		unsigned long size = vma->vm_end - vma->vm_start;
 
 		if (vma_is_special_mapping(vma, &vvar_mapping))
-			zap_page_range(vma, vma->vm_start, size);
+			zap_vma_page_range(vma, vma->vm_start, size);
 	}
 	mmap_read_unlock(mm);
 
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 4ad42b0f75cd..f7f10248c742 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
 	if (vma) {
 		trace_binder_unmap_user_start(alloc, index);
 
-		zap_page_range(vma, page_addr, PAGE_SIZE);
+		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
 
 		trace_binder_unmap_user_end(alloc, index);
 	}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b28eb9c6ea2..706efaf95783 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		  unsigned long size);
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-		    unsigned long size);
 void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
 			   unsigned long size, struct zap_details *details);
+static inline void zap_vma_page_range(struct vm_area_struct *vma,
+				 unsigned long address, unsigned long size)
+{
+	zap_page_range_single(vma, address, size, NULL);
+}
 void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 		struct vm_area_struct *start_vma, unsigned long start,
 		unsigned long end);
diff --git a/mm/madvise.c b/mm/madvise.c
index 87703a19bbef..3c4d9829d4e1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
  * Application no longer needs these pages.  If the pages are dirty,
  * it's OK to just throw them away.  The app will be more careful about
  * data it wants to keep.  Be sure to free swap resources too.  The
- * zap_page_range_single call sets things up for shrink_active_list to actually
+ * zap_vma_page_range call sets things up for shrink_active_list to actually
  * free these pages later if no one else has touched them in the meantime,
  * although we could add these pages to a global reuse list for
  * shrink_active_list to pick up before reclaiming other pages.
@@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 					unsigned long start, unsigned long end)
 {
-	zap_page_range_single(vma, start, end - start, NULL);
+	zap_vma_page_range(vma, start, end - start);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index 5b2c137dfb2a..e953a0108278 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
 	mmu_notifier_invalidate_range_end(&range);
 }
 
-/**
- * zap_page_range - remove user pages in a given range
- * @vma: vm_area_struct holding the applicable pages
- * @start: starting address of pages to zap
- * @size: number of bytes to zap
- *
- * Caller must protect the VMA list
- */
-void zap_page_range(struct vm_area_struct *vma, unsigned long start,
-		unsigned long size)
-{
-	struct maple_tree *mt = &vma->vm_mm->mm_mt;
-	unsigned long end = start + size;
-	struct mmu_notifier_range range;
-	struct mmu_gather tlb;
-	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
-
-	lru_add_drain();
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-				start, start + size);
-	tlb_gather_mmu(&tlb, vma->vm_mm);
-	update_hiwater_rss(vma->vm_mm);
-	mmu_notifier_invalidate_range_start(&range);
-	do {
-		unmap_single_vma(&tlb, vma, start, range.end, NULL);
-	} while ((vma = mas_find(&mas, end - 1)) != NULL);
-	mmu_notifier_invalidate_range_end(&range);
-	tlb_finish_mmu(&tlb);
-}
-
 /**
  * zap_page_range_single - remove user pages in a given range
  * @vma: vm_area_struct holding the applicable pages
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ad608ef2a243..bd9fe6ff6557 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  *
  * The caller must hold lock_page_memcg().  Most callers have the folio
  * locked.  A few have the folio blocked from truncation through other
- * means (eg zap_page_range() has it mapped and is holding the page table
+ * means (eg zap_vma_page_range() has it mapped and is holding the page table
  * lock).  This can also be called from mark_buffer_dirty(), which I
  * cannot prove is always protected against truncate.
  */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c567d5e8053e..afaad3cfed00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
 				*length + /* Mapped or pending */
 				(pages_remaining * PAGE_SIZE); /* Failed map. */
-		zap_page_range(vma, *address, maybe_zap_len);
+		zap_vma_page_range(vma, *address, maybe_zap_len);
 		err = 0;
 	}
 
@@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
 		unsigned long leftover_pages = pages_remaining;
 		int bytes_mapped;
 
-		/* We called zap_page_range, try to reinsert. */
+		/* We called zap_vma_page_range, try to reinsert. */
 		err = vm_insert_pages(vma, *address,
 				      pending_pages,
 				      &pages_remaining);
@@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
 	if (total_bytes_to_map) {
 		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
-			zap_page_range(vma, address, total_bytes_to_map);
+			zap_vma_page_range(vma, address, total_bytes_to_map);
 		zc->length = total_bytes_to_map;
 		zc->recv_skip_hint = 0;
 	} else {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-16 19:20 ` Mike Kravetz
  (?)
@ 2022-12-19 12:06   ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2022-12-19 12:06 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Peter Xu, Nadav Amit, Matthew Wilcox,
	Vlastimil Babka, Rik van Riel, Will Deacon, Michael Ellerman,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton

On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.

Could you spend a sentence or two explaining what is wrong here?

> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.

I would stick with zap_page_range_single rather than adding a new
wrapper but nothing really critical.

> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Other than that LGTM
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index e59a32aa0c49..a7b10e182f78 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 507f8228f983..479d70fe8c55 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_spec))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
> index eb5bed333750..8f57388b760b 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
>  	/*
>  	 * When the LPAR lost credits due to core removal or during
>  	 * migration, invalidate the existing mapping for the current
> -	 * paste addresses and set windows in-active (zap_page_range in
> +	 * paste addresses and set windows in-active (zap_vma_page_range in
>  	 * reconfig_close_windows()).
>  	 * New mapping will be done later after migration or new credits
>  	 * available. So continue to receive faults if the user space
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 4ad6e510d405..2aef8d9295a2 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
>  		 * is done before the original mmap() and after the ioctl.
>  		 */
>  		if (vma)
> -			zap_page_range(vma, vma->vm_start,
> +			zap_vma_page_range(vma, vma->vm_start,
>  					vma->vm_end - vma->vm_start);
>  
>  		mmap_write_unlock(task_ref->mm);
> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
> index ff7bf4432229..eccfcd505403 100644
> --- a/arch/s390/kernel/vdso.c
> +++ b/arch/s390/kernel/vdso.c
> @@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  
>  		if (!vma_is_special_mapping(vma, &vvar_mapping))
>  			continue;
> -		zap_page_range(vma, vma->vm_start, size);
> +		zap_vma_page_range(vma, vma->vm_start, size);
>  		break;
>  	}
>  	mmap_read_unlock(mm);
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 74e1d873dce0..67d998152142 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);
>  	}
>  	mmap_read_unlock(gmap->mm);
>  }
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index b8f3f9b9e53c..5aafbd19e869 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_mapping))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 4ad42b0f75cd..f7f10248c742 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  	if (vma) {
>  		trace_binder_unmap_user_start(alloc, index);
>  
> -		zap_page_range(vma, page_addr, PAGE_SIZE);
> +		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
>  
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b28eb9c6ea2..706efaf95783 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  
>  void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		  unsigned long size);
> -void zap_page_range(struct vm_area_struct *vma, unsigned long address,
> -		    unsigned long size);
>  void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
>  			   unsigned long size, struct zap_details *details);
> +static inline void zap_vma_page_range(struct vm_area_struct *vma,
> +				 unsigned long address, unsigned long size)
> +{
> +	zap_page_range_single(vma, address, size, NULL);
> +}
>  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  		struct vm_area_struct *start_vma, unsigned long start,
>  		unsigned long end);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 87703a19bbef..3c4d9829d4e1 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>   * Application no longer needs these pages.  If the pages are dirty,
>   * it's OK to just throw them away.  The app will be more careful about
>   * data it wants to keep.  Be sure to free swap resources too.  The
> - * zap_page_range_single call sets things up for shrink_active_list to actually
> + * zap_vma_page_range call sets things up for shrink_active_list to actually
>   * free these pages later if no one else has touched them in the meantime,
>   * although we could add these pages to a global reuse list for
>   * shrink_active_list to pick up before reclaiming other pages.
> @@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>  					unsigned long start, unsigned long end)
>  {
> -	zap_page_range_single(vma, start, end - start, NULL);
> +	zap_vma_page_range(vma, start, end - start);
>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 5b2c137dfb2a..e953a0108278 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  	mmu_notifier_invalidate_range_end(&range);
>  }
>  
> -/**
> - * zap_page_range - remove user pages in a given range
> - * @vma: vm_area_struct holding the applicable pages
> - * @start: starting address of pages to zap
> - * @size: number of bytes to zap
> - *
> - * Caller must protect the VMA list
> - */
> -void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> -		unsigned long size)
> -{
> -	struct maple_tree *mt = &vma->vm_mm->mm_mt;
> -	unsigned long end = start + size;
> -	struct mmu_notifier_range range;
> -	struct mmu_gather tlb;
> -	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
> -
> -	lru_add_drain();
> -	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> -				start, start + size);
> -	tlb_gather_mmu(&tlb, vma->vm_mm);
> -	update_hiwater_rss(vma->vm_mm);
> -	mmu_notifier_invalidate_range_start(&range);
> -	do {
> -		unmap_single_vma(&tlb, vma, start, range.end, NULL);
> -	} while ((vma = mas_find(&mas, end - 1)) != NULL);
> -	mmu_notifier_invalidate_range_end(&range);
> -	tlb_finish_mmu(&tlb);
> -}
> -
>  /**
>   * zap_page_range_single - remove user pages in a given range
>   * @vma: vm_area_struct holding the applicable pages
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ad608ef2a243..bd9fe6ff6557 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>   *
>   * The caller must hold lock_page_memcg().  Most callers have the folio
>   * locked.  A few have the folio blocked from truncation through other
> - * means (eg zap_page_range() has it mapped and is holding the page table
> + * means (eg zap_vma_page_range() has it mapped and is holding the page table
>   * lock).  This can also be called from mark_buffer_dirty(), which I
>   * cannot prove is always protected against truncate.
>   */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c567d5e8053e..afaad3cfed00 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
>  				*length + /* Mapped or pending */
>  				(pages_remaining * PAGE_SIZE); /* Failed map. */
> -		zap_page_range(vma, *address, maybe_zap_len);
> +		zap_vma_page_range(vma, *address, maybe_zap_len);
>  		err = 0;
>  	}
>  
> @@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		unsigned long leftover_pages = pages_remaining;
>  		int bytes_mapped;
>  
> -		/* We called zap_page_range, try to reinsert. */
> +		/* We called zap_vma_page_range, try to reinsert. */
>  		err = vm_insert_pages(vma, *address,
>  				      pending_pages,
>  				      &pages_remaining);
> @@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
>  	if (total_bytes_to_map) {
>  		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
> -			zap_page_range(vma, address, total_bytes_to_map);
> +			zap_vma_page_range(vma, address, total_bytes_to_map);
>  		zc->length = total_bytes_to_map;
>  		zc->recv_skip_hint = 0;
>  	} else {
> -- 
> 2.38.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-19 12:06   ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2022-12-19 12:06 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Christian Brauner, linux-s390, Nadav Amit, Dave Hansen,
	Will Deacon, David Hildenbrand, netdev, Rik van Riel,
	linux-kernel, Peter Xu, linux-mm, Eric Dumazet, Palmer Dabbelt,
	Matthew Wilcox, linux-riscv, Christian Borntraeger, linuxppc-dev,
	Andrew Morton, Vlastimil Babka

On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.

Could you spend a sentence or two explaining what is wrong here?

> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.

I would stick with zap_page_range_single rather than adding a new
wrapper but nothing really critical.

> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Other than that LGTM
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index e59a32aa0c49..a7b10e182f78 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 507f8228f983..479d70fe8c55 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_spec))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
> index eb5bed333750..8f57388b760b 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
>  	/*
>  	 * When the LPAR lost credits due to core removal or during
>  	 * migration, invalidate the existing mapping for the current
> -	 * paste addresses and set windows in-active (zap_page_range in
> +	 * paste addresses and set windows in-active (zap_vma_page_range in
>  	 * reconfig_close_windows()).
>  	 * New mapping will be done later after migration or new credits
>  	 * available. So continue to receive faults if the user space
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 4ad6e510d405..2aef8d9295a2 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
>  		 * is done before the original mmap() and after the ioctl.
>  		 */
>  		if (vma)
> -			zap_page_range(vma, vma->vm_start,
> +			zap_vma_page_range(vma, vma->vm_start,
>  					vma->vm_end - vma->vm_start);
>  
>  		mmap_write_unlock(task_ref->mm);
> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
> index ff7bf4432229..eccfcd505403 100644
> --- a/arch/s390/kernel/vdso.c
> +++ b/arch/s390/kernel/vdso.c
> @@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  
>  		if (!vma_is_special_mapping(vma, &vvar_mapping))
>  			continue;
> -		zap_page_range(vma, vma->vm_start, size);
> +		zap_vma_page_range(vma, vma->vm_start, size);
>  		break;
>  	}
>  	mmap_read_unlock(mm);
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 74e1d873dce0..67d998152142 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);
>  	}
>  	mmap_read_unlock(gmap->mm);
>  }
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index b8f3f9b9e53c..5aafbd19e869 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_mapping))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 4ad42b0f75cd..f7f10248c742 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  	if (vma) {
>  		trace_binder_unmap_user_start(alloc, index);
>  
> -		zap_page_range(vma, page_addr, PAGE_SIZE);
> +		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
>  
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b28eb9c6ea2..706efaf95783 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  
>  void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		  unsigned long size);
> -void zap_page_range(struct vm_area_struct *vma, unsigned long address,
> -		    unsigned long size);
>  void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
>  			   unsigned long size, struct zap_details *details);
> +static inline void zap_vma_page_range(struct vm_area_struct *vma,
> +				 unsigned long address, unsigned long size)
> +{
> +	zap_page_range_single(vma, address, size, NULL);
> +}
>  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  		struct vm_area_struct *start_vma, unsigned long start,
>  		unsigned long end);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 87703a19bbef..3c4d9829d4e1 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>   * Application no longer needs these pages.  If the pages are dirty,
>   * it's OK to just throw them away.  The app will be more careful about
>   * data it wants to keep.  Be sure to free swap resources too.  The
> - * zap_page_range_single call sets things up for shrink_active_list to actually
> + * zap_vma_page_range call sets things up for shrink_active_list to actually
>   * free these pages later if no one else has touched them in the meantime,
>   * although we could add these pages to a global reuse list for
>   * shrink_active_list to pick up before reclaiming other pages.
> @@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>  					unsigned long start, unsigned long end)
>  {
> -	zap_page_range_single(vma, start, end - start, NULL);
> +	zap_vma_page_range(vma, start, end - start);
>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 5b2c137dfb2a..e953a0108278 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  	mmu_notifier_invalidate_range_end(&range);
>  }
>  
> -/**
> - * zap_page_range - remove user pages in a given range
> - * @vma: vm_area_struct holding the applicable pages
> - * @start: starting address of pages to zap
> - * @size: number of bytes to zap
> - *
> - * Caller must protect the VMA list
> - */
> -void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> -		unsigned long size)
> -{
> -	struct maple_tree *mt = &vma->vm_mm->mm_mt;
> -	unsigned long end = start + size;
> -	struct mmu_notifier_range range;
> -	struct mmu_gather tlb;
> -	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
> -
> -	lru_add_drain();
> -	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> -				start, start + size);
> -	tlb_gather_mmu(&tlb, vma->vm_mm);
> -	update_hiwater_rss(vma->vm_mm);
> -	mmu_notifier_invalidate_range_start(&range);
> -	do {
> -		unmap_single_vma(&tlb, vma, start, range.end, NULL);
> -	} while ((vma = mas_find(&mas, end - 1)) != NULL);
> -	mmu_notifier_invalidate_range_end(&range);
> -	tlb_finish_mmu(&tlb);
> -}
> -
>  /**
>   * zap_page_range_single - remove user pages in a given range
>   * @vma: vm_area_struct holding the applicable pages
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ad608ef2a243..bd9fe6ff6557 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>   *
>   * The caller must hold lock_page_memcg().  Most callers have the folio
>   * locked.  A few have the folio blocked from truncation through other
> - * means (eg zap_page_range() has it mapped and is holding the page table
> + * means (eg zap_vma_page_range() has it mapped and is holding the page table
>   * lock).  This can also be called from mark_buffer_dirty(), which I
>   * cannot prove is always protected against truncate.
>   */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c567d5e8053e..afaad3cfed00 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
>  				*length + /* Mapped or pending */
>  				(pages_remaining * PAGE_SIZE); /* Failed map. */
> -		zap_page_range(vma, *address, maybe_zap_len);
> +		zap_vma_page_range(vma, *address, maybe_zap_len);
>  		err = 0;
>  	}
>  
> @@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		unsigned long leftover_pages = pages_remaining;
>  		int bytes_mapped;
>  
> -		/* We called zap_page_range, try to reinsert. */
> +		/* We called zap_vma_page_range, try to reinsert. */
>  		err = vm_insert_pages(vma, *address,
>  				      pending_pages,
>  				      &pages_remaining);
> @@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
>  	if (total_bytes_to_map) {
>  		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
> -			zap_page_range(vma, address, total_bytes_to_map);
> +			zap_vma_page_range(vma, address, total_bytes_to_map);
>  		zc->length = total_bytes_to_map;
>  		zc->recv_skip_hint = 0;
>  	} else {
> -- 
> 2.38.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-19 12:06   ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2022-12-19 12:06 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Peter Xu, Nadav Amit, Matthew Wilcox,
	Vlastimil Babka, Rik van Riel, Will Deacon, Michael Ellerman,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton

On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.

Could you spend a sentence or two explaining what is wrong here?

> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.

I would stick with zap_page_range_single rather than adding a new
wrapper but nothing really critical.

> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Other than that LGTM
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index e59a32aa0c49..a7b10e182f78 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -141,10 +141,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index 507f8228f983..479d70fe8c55 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -123,7 +123,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_spec))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
> index eb5bed333750..8f57388b760b 100644
> --- a/arch/powerpc/platforms/book3s/vas-api.c
> +++ b/arch/powerpc/platforms/book3s/vas-api.c
> @@ -414,7 +414,7 @@ static vm_fault_t vas_mmap_fault(struct vm_fault *vmf)
>  	/*
>  	 * When the LPAR lost credits due to core removal or during
>  	 * migration, invalidate the existing mapping for the current
> -	 * paste addresses and set windows in-active (zap_page_range in
> +	 * paste addresses and set windows in-active (zap_vma_page_range in
>  	 * reconfig_close_windows()).
>  	 * New mapping will be done later after migration or new credits
>  	 * available. So continue to receive faults if the user space
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 4ad6e510d405..2aef8d9295a2 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -760,7 +760,7 @@ static int reconfig_close_windows(struct vas_caps *vcap, int excess_creds,
>  		 * is done before the original mmap() and after the ioctl.
>  		 */
>  		if (vma)
> -			zap_page_range(vma, vma->vm_start,
> +			zap_vma_page_range(vma, vma->vm_start,
>  					vma->vm_end - vma->vm_start);
>  
>  		mmap_write_unlock(task_ref->mm);
> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}
>  
> diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
> index ff7bf4432229..eccfcd505403 100644
> --- a/arch/s390/kernel/vdso.c
> +++ b/arch/s390/kernel/vdso.c
> @@ -63,7 +63,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  
>  		if (!vma_is_special_mapping(vma, &vvar_mapping))
>  			continue;
> -		zap_page_range(vma, vma->vm_start, size);
> +		zap_vma_page_range(vma, vma->vm_start, size);
>  		break;
>  	}
>  	mmap_read_unlock(mm);
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 74e1d873dce0..67d998152142 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);
>  	}
>  	mmap_read_unlock(gmap->mm);
>  }
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index b8f3f9b9e53c..5aafbd19e869 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -116,7 +116,7 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, &vvar_mapping))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  	}
>  	mmap_read_unlock(mm);
>  
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 4ad42b0f75cd..f7f10248c742 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -1019,7 +1019,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  	if (vma) {
>  		trace_binder_unmap_user_start(alloc, index);
>  
> -		zap_page_range(vma, page_addr, PAGE_SIZE);
> +		zap_vma_page_range(vma, page_addr, PAGE_SIZE);
>  
>  		trace_binder_unmap_user_end(alloc, index);
>  	}
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b28eb9c6ea2..706efaf95783 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1980,10 +1980,13 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  
>  void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		  unsigned long size);
> -void zap_page_range(struct vm_area_struct *vma, unsigned long address,
> -		    unsigned long size);
>  void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
>  			   unsigned long size, struct zap_details *details);
> +static inline void zap_vma_page_range(struct vm_area_struct *vma,
> +				 unsigned long address, unsigned long size)
> +{
> +	zap_page_range_single(vma, address, size, NULL);
> +}
>  void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  		struct vm_area_struct *start_vma, unsigned long start,
>  		unsigned long end);
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 87703a19bbef..3c4d9829d4e1 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -787,7 +787,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>   * Application no longer needs these pages.  If the pages are dirty,
>   * it's OK to just throw them away.  The app will be more careful about
>   * data it wants to keep.  Be sure to free swap resources too.  The
> - * zap_page_range_single call sets things up for shrink_active_list to actually
> + * zap_vma_page_range call sets things up for shrink_active_list to actually
>   * free these pages later if no one else has touched them in the meantime,
>   * although we could add these pages to a global reuse list for
>   * shrink_active_list to pick up before reclaiming other pages.
> @@ -805,7 +805,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>  					unsigned long start, unsigned long end)
>  {
> -	zap_page_range_single(vma, start, end - start, NULL);
> +	zap_vma_page_range(vma, start, end - start);
>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 5b2c137dfb2a..e953a0108278 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1687,36 +1687,6 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
>  	mmu_notifier_invalidate_range_end(&range);
>  }
>  
> -/**
> - * zap_page_range - remove user pages in a given range
> - * @vma: vm_area_struct holding the applicable pages
> - * @start: starting address of pages to zap
> - * @size: number of bytes to zap
> - *
> - * Caller must protect the VMA list
> - */
> -void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> -		unsigned long size)
> -{
> -	struct maple_tree *mt = &vma->vm_mm->mm_mt;
> -	unsigned long end = start + size;
> -	struct mmu_notifier_range range;
> -	struct mmu_gather tlb;
> -	MA_STATE(mas, mt, vma->vm_end, vma->vm_end);
> -
> -	lru_add_drain();
> -	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> -				start, start + size);
> -	tlb_gather_mmu(&tlb, vma->vm_mm);
> -	update_hiwater_rss(vma->vm_mm);
> -	mmu_notifier_invalidate_range_start(&range);
> -	do {
> -		unmap_single_vma(&tlb, vma, start, range.end, NULL);
> -	} while ((vma = mas_find(&mas, end - 1)) != NULL);
> -	mmu_notifier_invalidate_range_end(&range);
> -	tlb_finish_mmu(&tlb);
> -}
> -
>  /**
>   * zap_page_range_single - remove user pages in a given range
>   * @vma: vm_area_struct holding the applicable pages
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ad608ef2a243..bd9fe6ff6557 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2713,7 +2713,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>   *
>   * The caller must hold lock_page_memcg().  Most callers have the folio
>   * locked.  A few have the folio blocked from truncation through other
> - * means (eg zap_page_range() has it mapped and is holding the page table
> + * means (eg zap_vma_page_range() has it mapped and is holding the page table
>   * lock).  This can also be called from mark_buffer_dirty(), which I
>   * cannot prove is always protected against truncate.
>   */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c567d5e8053e..afaad3cfed00 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2092,7 +2092,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		maybe_zap_len = total_bytes_to_map -  /* All bytes to map */
>  				*length + /* Mapped or pending */
>  				(pages_remaining * PAGE_SIZE); /* Failed map. */
> -		zap_page_range(vma, *address, maybe_zap_len);
> +		zap_vma_page_range(vma, *address, maybe_zap_len);
>  		err = 0;
>  	}
>  
> @@ -2100,7 +2100,7 @@ static int tcp_zerocopy_vm_insert_batch_error(struct vm_area_struct *vma,
>  		unsigned long leftover_pages = pages_remaining;
>  		int bytes_mapped;
>  
> -		/* We called zap_page_range, try to reinsert. */
> +		/* We called zap_vma_page_range, try to reinsert. */
>  		err = vm_insert_pages(vma, *address,
>  				      pending_pages,
>  				      &pages_remaining);
> @@ -2234,7 +2234,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
>  	total_bytes_to_map = avail_len & ~(PAGE_SIZE - 1);
>  	if (total_bytes_to_map) {
>  		if (!(zc->flags & TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT))
> -			zap_page_range(vma, address, total_bytes_to_map);
> +			zap_vma_page_range(vma, address, total_bytes_to_map);
>  		zc->length = total_bytes_to_map;
>  		zc->recv_skip_hint = 0;
>  	} else {
> -- 
> 2.38.1

-- 
Michal Hocko
SUSE Labs

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-19 12:06   ` Michal Hocko
  (?)
@ 2022-12-19 19:22     ` Mike Kravetz
  -1 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-19 19:22 UTC (permalink / raw)
  To: Michal Hocko, Jérôme Glisse
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Peter Xu, Nadav Amit, Matthew Wilcox,
	Vlastimil Babka, Rik van Riel, Will Deacon, Michael Ellerman,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton

On 12/19/22 13:06, Michal Hocko wrote:
> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> > zap_page_range was originally designed to unmap pages within an address
> > range that could span multiple vmas.  While working on [1], it was
> > discovered that all callers of zap_page_range pass a range entirely within
> > a single vma.  In addition, the mmu notification call within zap_page
> > range does not correctly handle ranges that span multiple vmas as calls
> > should be vma specific.
> 
> Could you spend a sentence or two explaining what is wrong here?

Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
was supposed to be within the specified vma.  When looking into the notifier
routines, I could not find any documentation about the usage of the vma within
the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
"mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
However, I do not see this being used today.

Of course, I could be missing something, so adding Jérôme.

> 
> > Instead of fixing zap_page_range, change all callers to use the new
> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> > zap_page_range_single passing in NULL zap details.  The name is also
> > more in line with other exported routines that operate within a vma.
> > We can then remove zap_page_range.
> 
> I would stick with zap_page_range_single rather than adding a new
> wrapper but nothing really critical.

I am fine with doing that as well.  My only reason for the wrapper is that all 
callers outside mm/memory.c would pass in NULL zap details.

> 
> > Also, change madvise_dontneed_single_vma to use this new routine.
> > 
> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Other than that LGTM
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!

Thanks for taking a look.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-19 19:22     ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-19 19:22 UTC (permalink / raw)
  To: Michal Hocko, Jérôme Glisse
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Peter Xu, Nadav Amit, Matthew Wilcox,
	Vlastimil Babka, Rik van Riel, Will Deacon, Michael Ellerman,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton

On 12/19/22 13:06, Michal Hocko wrote:
> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> > zap_page_range was originally designed to unmap pages within an address
> > range that could span multiple vmas.  While working on [1], it was
> > discovered that all callers of zap_page_range pass a range entirely within
> > a single vma.  In addition, the mmu notification call within zap_page
> > range does not correctly handle ranges that span multiple vmas as calls
> > should be vma specific.
> 
> Could you spend a sentence or two explaining what is wrong here?

Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
was supposed to be within the specified vma.  When looking into the notifier
routines, I could not find any documentation about the usage of the vma within
the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
"mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
However, I do not see this being used today.

Of course, I could be missing something, so adding Jérôme.

> 
> > Instead of fixing zap_page_range, change all callers to use the new
> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> > zap_page_range_single passing in NULL zap details.  The name is also
> > more in line with other exported routines that operate within a vma.
> > We can then remove zap_page_range.
> 
> I would stick with zap_page_range_single rather than adding a new
> wrapper but nothing really critical.

I am fine with doing that as well.  My only reason for the wrapper is that all 
callers outside mm/memory.c would pass in NULL zap details.

> 
> > Also, change madvise_dontneed_single_vma to use this new routine.
> > 
> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Other than that LGTM
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!

Thanks for taking a look.
-- 
Mike Kravetz

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-19 19:22     ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-19 19:22 UTC (permalink / raw)
  To: Michal Hocko, Jérôme Glisse
  Cc: Christian Brauner, linux-s390, Nadav Amit, Dave Hansen,
	Will Deacon, David Hildenbrand, netdev, Rik van Riel,
	linux-kernel, Peter Xu, linux-mm, Eric Dumazet, Palmer Dabbelt,
	Matthew Wilcox, linux-riscv, Christian Borntraeger, linuxppc-dev,
	Andrew Morton, Vlastimil Babka

On 12/19/22 13:06, Michal Hocko wrote:
> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
> > zap_page_range was originally designed to unmap pages within an address
> > range that could span multiple vmas.  While working on [1], it was
> > discovered that all callers of zap_page_range pass a range entirely within
> > a single vma.  In addition, the mmu notification call within zap_page
> > range does not correctly handle ranges that span multiple vmas as calls
> > should be vma specific.
> 
> Could you spend a sentence or two explaining what is wrong here?

Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
was supposed to be within the specified vma.  When looking into the notifier
routines, I could not find any documentation about the usage of the vma within
the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
"mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
However, I do not see this being used today.

Of course, I could be missing something, so adding Jérôme.

> 
> > Instead of fixing zap_page_range, change all callers to use the new
> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> > zap_page_range_single passing in NULL zap details.  The name is also
> > more in line with other exported routines that operate within a vma.
> > We can then remove zap_page_range.
> 
> I would stick with zap_page_range_single rather than adding a new
> wrapper but nothing really critical.

I am fine with doing that as well.  My only reason for the wrapper is that all 
callers outside mm/memory.c would pass in NULL zap details.

> 
> > Also, change madvise_dontneed_single_vma to use this new routine.
> > 
> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Other than that LGTM
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!

Thanks for taking a look.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-16 19:20 ` Mike Kravetz
  (?)
@ 2022-12-20 17:38   ` Peter Xu
  -1 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2022-12-20 17:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

On Fri, Dec 16, 2022 at 11:20:12AM -0800, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
> 
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
> 
> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-20 17:38   ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2022-12-20 17:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

On Fri, Dec 16, 2022 at 11:20:12AM -0800, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
> 
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
> 
> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-20 17:38   ` Peter Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Xu @ 2022-12-20 17:38 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Christian Brauner, linux-s390, Nadav Amit, Michal Hocko,
	Will Deacon, David Hildenbrand, netdev, Rik van Riel,
	linux-kernel, Matthew Wilcox, linux-mm, Eric Dumazet,
	Palmer Dabbelt, Dave Hansen, linux-riscv, Christian Borntraeger,
	linuxppc-dev, Andrew Morton, Vlastimil Babka

On Fri, Dec 16, 2022 at 11:20:12AM -0800, Mike Kravetz wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
> 
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
> 
> Also, change madvise_dontneed_single_vma to use this new routine.
> 
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Peter Xu <peterx@redhat.com>

Thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-16 19:20 ` Mike Kravetz
  (?)
@ 2022-12-21  3:15   ` Michael Ellerman
  -1 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2022-12-21  3:15 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, linuxppc-dev, linux-riscv,
	linux-s390, netdev
  Cc: David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton, Mike Kravetz

Mike Kravetz <mike.kravetz@oracle.com> writes:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
  
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-21  3:15   ` Michael Ellerman
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2022-12-21  3:15 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, linuxppc-dev, linux-riscv,
	linux-s390, netdev
  Cc: David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Palmer Dabbelt, Christian Borntraeger, Dave Hansen,
	Christian Brauner, Eric Dumazet, Andrew Morton, Mike Kravetz

Mike Kravetz <mike.kravetz@oracle.com> writes:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
  
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-21  3:15   ` Michael Ellerman
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2022-12-21  3:15 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel, linuxppc-dev, linux-riscv,
	linux-s390, netdev
  Cc: Christian Brauner, Michal Hocko, David Hildenbrand, Will Deacon,
	Rik van Riel, Matthew Wilcox, Dave Hansen, Eric Dumazet,
	Palmer Dabbelt, Peter Xu, Andrew Morton, Christian Borntraeger,
	Nadav Amit, Vlastimil Babka, Mike Kravetz

Mike Kravetz <mike.kravetz@oracle.com> writes:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
  
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-16 19:20 ` Mike Kravetz
  (?)
@ 2022-12-23 16:27   ` Christoph Hellwig
  -1 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2022-12-23 16:27 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif

So for something called zap_vma_page_range I'd expect to just pass
the vma and zap all of it, which this and many other callers want
anyway.

> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);

And then just call zap_page_range_single directly for those that
don't want to zap the entire vma.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-23 16:27   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2022-12-23 16:27 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif

So for something called zap_vma_page_range I'd expect to just pass
the vma and zap all of it, which this and many other callers want
anyway.

> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);

And then just call zap_page_range_single directly for those that
don't want to zap the entire vma.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-23 16:27   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2022-12-23 16:27 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Christian Brauner, linux-s390, Nadav Amit, Michal Hocko,
	Will Deacon, David Hildenbrand, netdev, Rik van Riel,
	linux-kernel, Peter Xu, linux-mm, Eric Dumazet, Palmer Dabbelt,
	Matthew Wilcox, Dave Hansen, linux-riscv, Christian Borntraeger,
	linuxppc-dev, Andrew Morton, Vlastimil Babka

>  		unsigned long size = vma->vm_end - vma->vm_start;
>  
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif

So for something called zap_vma_page_range I'd expect to just pass
the vma and zap all of it, which this and many other callers want
anyway.

> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
>  		if (is_vm_hugetlb_page(vma))
>  			continue;
>  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> -		zap_page_range(vma, vmaddr, size);
> +		zap_vma_page_range(vma, vmaddr, size);

And then just call zap_page_range_single directly for those that
don't want to zap the entire vma.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-23 16:27   ` Christoph Hellwig
  (?)
@ 2022-12-23 21:22     ` Mike Kravetz
  -1 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-23 21:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

On 12/23/22 08:27, Christoph Hellwig wrote:
> >  		unsigned long size = vma->vm_end - vma->vm_start;
> >  
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #ifdef CONFIG_COMPAT_VDSO
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #endif
> 
> So for something called zap_vma_page_range I'd expect to just pass
> the vma and zap all of it, which this and many other callers want
> anyway.
> 
> > +++ b/arch/s390/mm/gmap.c
> > @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
> >  		if (is_vm_hugetlb_page(vma))
> >  			continue;
> >  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> > -		zap_page_range(vma, vmaddr, size);
> > +		zap_vma_page_range(vma, vmaddr, size);
> 
> And then just call zap_page_range_single directly for those that
> don't want to zap the entire vma.

Thanks!

This sounds like a good idea and I will incorporate in a new patch.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-23 21:22     ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-23 21:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, David Hildenbrand, Michal Hocko, Peter Xu, Nadav Amit,
	Matthew Wilcox, Vlastimil Babka, Rik van Riel, Will Deacon,
	Michael Ellerman, Palmer Dabbelt, Christian Borntraeger,
	Dave Hansen, Christian Brauner, Eric Dumazet, Andrew Morton

On 12/23/22 08:27, Christoph Hellwig wrote:
> >  		unsigned long size = vma->vm_end - vma->vm_start;
> >  
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #ifdef CONFIG_COMPAT_VDSO
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #endif
> 
> So for something called zap_vma_page_range I'd expect to just pass
> the vma and zap all of it, which this and many other callers want
> anyway.
> 
> > +++ b/arch/s390/mm/gmap.c
> > @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
> >  		if (is_vm_hugetlb_page(vma))
> >  			continue;
> >  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> > -		zap_page_range(vma, vmaddr, size);
> > +		zap_vma_page_range(vma, vmaddr, size);
> 
> And then just call zap_page_range_single directly for those that
> don't want to zap the entire vma.

Thanks!

This sounds like a good idea and I will incorporate in a new patch.

-- 
Mike Kravetz

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-23 21:22     ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2022-12-23 21:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, linux-s390, Nadav Amit, Michal Hocko,
	Will Deacon, David Hildenbrand, netdev, Rik van Riel,
	linux-kernel, Peter Xu, linux-mm, Eric Dumazet, Palmer Dabbelt,
	Matthew Wilcox, Dave Hansen, linux-riscv, Christian Borntraeger,
	linuxppc-dev, Andrew Morton, Vlastimil Babka

On 12/23/22 08:27, Christoph Hellwig wrote:
> >  		unsigned long size = vma->vm_end - vma->vm_start;
> >  
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #ifdef CONFIG_COMPAT_VDSO
> >  		if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> > -			zap_page_range(vma, vma->vm_start, size);
> > +			zap_vma_page_range(vma, vma->vm_start, size);
> >  #endif
> 
> So for something called zap_vma_page_range I'd expect to just pass
> the vma and zap all of it, which this and many other callers want
> anyway.
> 
> > +++ b/arch/s390/mm/gmap.c
> > @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
> >  		if (is_vm_hugetlb_page(vma))
> >  			continue;
> >  		size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> > -		zap_page_range(vma, vmaddr, size);
> > +		zap_vma_page_range(vma, vmaddr, size);
> 
> And then just call zap_page_range_single directly for those that
> don't want to zap the entire vma.

Thanks!

This sounds like a good idea and I will incorporate in a new patch.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-16 19:20 ` Mike Kravetz
  (?)
@ 2022-12-29 16:22   ` Palmer Dabbelt
  -1 siblings, 0 replies; 27+ messages in thread
From: Palmer Dabbelt @ 2022-12-29 16:22 UTC (permalink / raw)
  To: mike.kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, david, mhocko, peterx, nadav.amit, willy, vbabka, riel,
	Will Deacon, mpe, borntraeger, dave.hansen, brauner, edumazet,
	akpm, mike.kravetz

On Fri, 16 Dec 2022 11:20:12 PST (-0800), mike.kravetz@oracle.com wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)

[snip]

> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}

Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V

Thanks!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-29 16:22   ` Palmer Dabbelt
  0 siblings, 0 replies; 27+ messages in thread
From: Palmer Dabbelt @ 2022-12-29 16:22 UTC (permalink / raw)
  To: mike.kravetz
  Cc: linux-mm, linux-kernel, linuxppc-dev, linux-riscv, linux-s390,
	netdev, david, mhocko, peterx, nadav.amit, willy, vbabka, riel,
	Will Deacon, mpe, borntraeger, dave.hansen, brauner, edumazet,
	akpm, mike.kravetz

On Fri, 16 Dec 2022 11:20:12 PST (-0800), mike.kravetz@oracle.com wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)

[snip]

> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}

Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2022-12-29 16:22   ` Palmer Dabbelt
  0 siblings, 0 replies; 27+ messages in thread
From: Palmer Dabbelt @ 2022-12-29 16:22 UTC (permalink / raw)
  To: mike.kravetz
  Cc: brauner, linux-s390, nadav.amit, mhocko, Will Deacon, david,
	netdev, riel, linux-kernel, peterx, linux-mm, edumazet,
	dave.hansen, willy, linux-riscv, borntraeger, linuxppc-dev, akpm,
	vbabka, mike.kravetz

On Fri, 16 Dec 2022 11:20:12 PST (-0800), mike.kravetz@oracle.com wrote:
> zap_page_range was originally designed to unmap pages within an address
> range that could span multiple vmas.  While working on [1], it was
> discovered that all callers of zap_page_range pass a range entirely within
> a single vma.  In addition, the mmu notification call within zap_page
> range does not correctly handle ranges that span multiple vmas as calls
> should be vma specific.
>
> Instead of fixing zap_page_range, change all callers to use the new
> routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
> zap_page_range_single passing in NULL zap details.  The name is also
> more in line with other exported routines that operate within a vma.
> We can then remove zap_page_range.
>
> Also, change madvise_dontneed_single_vma to use this new routine.
>
> [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/kernel/vdso.c                |  4 ++--
>  arch/powerpc/kernel/vdso.c              |  2 +-
>  arch/powerpc/platforms/book3s/vas-api.c |  2 +-
>  arch/powerpc/platforms/pseries/vas.c    |  2 +-
>  arch/riscv/kernel/vdso.c                |  4 ++--
>  arch/s390/kernel/vdso.c                 |  2 +-
>  arch/s390/mm/gmap.c                     |  2 +-
>  arch/x86/entry/vdso/vma.c               |  2 +-
>  drivers/android/binder_alloc.c          |  2 +-
>  include/linux/mm.h                      |  7 ++++--
>  mm/madvise.c                            |  4 ++--
>  mm/memory.c                             | 30 -------------------------
>  mm/page-writeback.c                     |  2 +-
>  net/ipv4/tcp.c                          |  6 ++---
>  14 files changed, 22 insertions(+), 49 deletions(-)

[snip]

> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index e410275918ac..a405119da2c0 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -127,10 +127,10 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
>  		unsigned long size = vma->vm_end - vma->vm_start;
>
>  		if (vma_is_special_mapping(vma, vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT
>  		if (vma_is_special_mapping(vma, compat_vdso_info.dm))
> -			zap_page_range(vma, vma->vm_start, size);
> +			zap_vma_page_range(vma, vma->vm_start, size);
>  #endif
>  	}

Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
  2022-12-19 19:22     ` Mike Kravetz
  (?)
@ 2023-01-05  1:14       ` Alistair Popple
  -1 siblings, 0 replies; 27+ messages in thread
From: Alistair Popple @ 2023-01-05  1:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Jérôme Glisse, linux-mm, linux-kernel,
	linuxppc-dev, linux-riscv, linux-s390, netdev, David Hildenbrand,
	Peter Xu, Nadav Amit, Matthew Wilcox, Vlastimil Babka,
	Rik van Riel, Will Deacon, Michael Ellerman, Palmer Dabbelt,
	Christian Borntraeger, Dave Hansen, Christian Brauner,
	Eric Dumazet, Andrew Morton


Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 12/19/22 13:06, Michal Hocko wrote:
>> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
>> > zap_page_range was originally designed to unmap pages within an address
>> > range that could span multiple vmas.  While working on [1], it was
>> > discovered that all callers of zap_page_range pass a range entirely within
>> > a single vma.  In addition, the mmu notification call within zap_page
>> > range does not correctly handle ranges that span multiple vmas as calls
>> > should be vma specific.
>> 
>> Could you spend a sentence or two explaining what is wrong here?
>
> Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
> was supposed to be within the specified vma.  When looking into the notifier
> routines, I could not find any documentation about the usage of the vma within
> the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
> "mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
> However, I do not see this being used today.
>
> Of course, I could be missing something, so adding Jérôme.

The only use for mmu_notifier_range->vma I can find is in
mmu_notifier_range_update_to_read_only() which was introduced in
c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only()
helper"). However there are no users of that symbol so I think we can
remove it along with the mmu_notifier_range->vma field.

I will put togeather a patch to do that.

>> 
>> > Instead of fixing zap_page_range, change all callers to use the new
>> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
>> > zap_page_range_single passing in NULL zap details.  The name is also
>> > more in line with other exported routines that operate within a vma.
>> > We can then remove zap_page_range.
>> 
>> I would stick with zap_page_range_single rather than adding a new
>> wrapper but nothing really critical.
>
> I am fine with doing that as well.  My only reason for the wrapper is that all 
> callers outside mm/memory.c would pass in NULL zap details.
>
>> 
>> > Also, change madvise_dontneed_single_vma to use this new routine.
>> > 
>> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
>> > Suggested-by: Peter Xu <peterx@redhat.com>
>> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> 
>> Other than that LGTM
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> 
>> Thanks!
>
> Thanks for taking a look.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2023-01-05  1:14       ` Alistair Popple
  0 siblings, 0 replies; 27+ messages in thread
From: Alistair Popple @ 2023-01-05  1:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, David Hildenbrand, Dave Hansen, Peter Xu, linux-mm,
	Eric Dumazet, linux-riscv, Nadav Amit, linux-s390, Will Deacon,
	Matthew Wilcox, Christian Borntraeger, Rik van Riel,
	Jérôme Glisse, Vlastimil Babka, Christian Brauner,
	netdev, linux-kernel, Palmer Dabbelt, Andrew Morton,
	linuxppc-dev


Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 12/19/22 13:06, Michal Hocko wrote:
>> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
>> > zap_page_range was originally designed to unmap pages within an address
>> > range that could span multiple vmas.  While working on [1], it was
>> > discovered that all callers of zap_page_range pass a range entirely within
>> > a single vma.  In addition, the mmu notification call within zap_page
>> > range does not correctly handle ranges that span multiple vmas as calls
>> > should be vma specific.
>> 
>> Could you spend a sentence or two explaining what is wrong here?
>
> Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
> was supposed to be within the specified vma.  When looking into the notifier
> routines, I could not find any documentation about the usage of the vma within
> the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
> "mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
> However, I do not see this being used today.
>
> Of course, I could be missing something, so adding Jérôme.

The only use for mmu_notifier_range->vma I can find is in
mmu_notifier_range_update_to_read_only() which was introduced in
c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only()
helper"). However there are no users of that symbol so I think we can
remove it along with the mmu_notifier_range->vma field.

I will put togeather a patch to do that.

>> 
>> > Instead of fixing zap_page_range, change all callers to use the new
>> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
>> > zap_page_range_single passing in NULL zap details.  The name is also
>> > more in line with other exported routines that operate within a vma.
>> > We can then remove zap_page_range.
>> 
>> I would stick with zap_page_range_single rather than adding a new
>> wrapper but nothing really critical.
>
> I am fine with doing that as well.  My only reason for the wrapper is that all 
> callers outside mm/memory.c would pass in NULL zap details.
>
>> 
>> > Also, change madvise_dontneed_single_vma to use this new routine.
>> > 
>> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
>> > Suggested-by: Peter Xu <peterx@redhat.com>
>> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> 
>> Other than that LGTM
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> 
>> Thanks!
>
> Thanks for taking a look.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range
@ 2023-01-05  1:14       ` Alistair Popple
  0 siblings, 0 replies; 27+ messages in thread
From: Alistair Popple @ 2023-01-05  1:14 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, Jérôme Glisse, linux-mm, linux-kernel,
	linuxppc-dev, linux-riscv, linux-s390, netdev, David Hildenbrand,
	Peter Xu, Nadav Amit, Matthew Wilcox, Vlastimil Babka,
	Rik van Riel, Will Deacon, Michael Ellerman, Palmer Dabbelt,
	Christian Borntraeger, Dave Hansen, Christian Brauner,
	Eric Dumazet, Andrew Morton


Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 12/19/22 13:06, Michal Hocko wrote:
>> On Fri 16-12-22 11:20:12, Mike Kravetz wrote:
>> > zap_page_range was originally designed to unmap pages within an address
>> > range that could span multiple vmas.  While working on [1], it was
>> > discovered that all callers of zap_page_range pass a range entirely within
>> > a single vma.  In addition, the mmu notification call within zap_page
>> > range does not correctly handle ranges that span multiple vmas as calls
>> > should be vma specific.
>> 
>> Could you spend a sentence or two explaining what is wrong here?
>
> Hmmmm?  My assumption was that the range passed to mmu_notifier_range_init()
> was supposed to be within the specified vma.  When looking into the notifier
> routines, I could not find any documentation about the usage of the vma within
> the mmu_notifier_range structure.  It was introduced with commit bf198b2b34bf
> "mm/mmu_notifier: pass down vma and reasons why mmu notifier is happening".
> However, I do not see this being used today.
>
> Of course, I could be missing something, so adding Jérôme.

The only use for mmu_notifier_range->vma I can find is in
mmu_notifier_range_update_to_read_only() which was introduced in
c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only()
helper"). However there are no users of that symbol so I think we can
remove it along with the mmu_notifier_range->vma field.

I will put togeather a patch to do that.

>> 
>> > Instead of fixing zap_page_range, change all callers to use the new
>> > routine zap_vma_page_range.  zap_vma_page_range is just a wrapper around
>> > zap_page_range_single passing in NULL zap details.  The name is also
>> > more in line with other exported routines that operate within a vma.
>> > We can then remove zap_page_range.
>> 
>> I would stick with zap_page_range_single rather than adding a new
>> wrapper but nothing really critical.
>
> I am fine with doing that as well.  My only reason for the wrapper is that all 
> callers outside mm/memory.c would pass in NULL zap details.
>
>> 
>> > Also, change madvise_dontneed_single_vma to use this new routine.
>> > 
>> > [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
>> > Suggested-by: Peter Xu <peterx@redhat.com>
>> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> 
>> Other than that LGTM
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> 
>> Thanks!
>
> Thanks for taking a look.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-01-05 18:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 19:20 [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range Mike Kravetz
2022-12-16 19:20 ` Mike Kravetz
2022-12-16 19:20 ` Mike Kravetz
2022-12-19 12:06 ` Michal Hocko
2022-12-19 12:06   ` Michal Hocko
2022-12-19 12:06   ` Michal Hocko
2022-12-19 19:22   ` Mike Kravetz
2022-12-19 19:22     ` Mike Kravetz
2022-12-19 19:22     ` Mike Kravetz
2023-01-05  1:14     ` Alistair Popple
2023-01-05  1:14       ` Alistair Popple
2023-01-05  1:14       ` Alistair Popple
2022-12-20 17:38 ` Peter Xu
2022-12-20 17:38   ` Peter Xu
2022-12-20 17:38   ` Peter Xu
2022-12-21  3:15 ` Michael Ellerman
2022-12-21  3:15   ` Michael Ellerman
2022-12-21  3:15   ` Michael Ellerman
2022-12-23 16:27 ` Christoph Hellwig
2022-12-23 16:27   ` Christoph Hellwig
2022-12-23 16:27   ` Christoph Hellwig
2022-12-23 21:22   ` Mike Kravetz
2022-12-23 21:22     ` Mike Kravetz
2022-12-23 21:22     ` Mike Kravetz
2022-12-29 16:22 ` Palmer Dabbelt
2022-12-29 16:22   ` Palmer Dabbelt
2022-12-29 16:22   ` Palmer Dabbelt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.