linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] mm: Rework zap ptes on swap entries
@ 2022-01-28  4:54 Peter Xu
  2022-01-28  4:54 ` [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Peter Xu @ 2022-01-28  4:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Alistair Popple, Andrew Morton, Andrea Arcangeli,
	David Hildenbrand, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

v3:
- Patch 1:
  - Fix !non_swap_entry() case and hwpoison case too [Hugh]
  - Added reproducer program in commit message
  - Introduced should_zap_cows() helper
- Added patch 2, "mm: Rename zap_skip_check_mapping() to should_zap_page()"
- Added patch 3, "mm: Change zap_details.zap_mapping into even_cows"

RFC V2: https://lore.kernel.org/lkml/20211115134951.85286-1-peterx@redhat.com
RFC V1: https://lore.kernel.org/lkml/20211110082952.19266-1-peterx@redhat.com

Thanks to Hugh's help, we're pretty clear on the history of zap_details and
swap skipping behavior, hence dropping the RFC tag.

Patch 1 should fix a long standing bug for zap_pte_range() on zap_details
usage.  The risk is we could have some swap entries skipped while we should
have zapped them.

Migration entries are not the major concern because file backed memory always
zap in the pattern that "first time without page lock, then re-zap with page
lock" hence the 2nd zap will always make sure all migration entries are already
recovered.

However there can be issues with real swap entries got skipped errornoously.
There's a reproducer provided in commit message of patch 1 for that.

Patch 2-4 are cleanups that are based on patch 1.  After the whole patchset
applied, we should have a very clean view of zap_pte_range().

Only patch 1 needs to be backported to stable.

Please review, thanks.

Peter Xu (4):
  mm: Don't skip swap entry even if zap_details specified
  mm: Rename zap_skip_check_mapping() to should_zap_page()
  mm: Change zap_details.zap_mapping into even_cows
  mm: Rework swap handling of zap_pte_range

 mm/memory.c | 85 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 50 insertions(+), 35 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified
  2022-01-28  4:54 [PATCH v3 0/4] mm: Rework zap ptes on swap entries Peter Xu
@ 2022-01-28  4:54 ` Peter Xu
  2022-01-28  4:54 ` [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page() Peter Xu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-01-28  4:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Alistair Popple, Andrew Morton, Andrea Arcangeli,
	David Hildenbrand, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

The "details" pointer shouldn't be the token to decide whether we should skip
swap entries.  For example, when the user specified details->zap_mapping==NULL,
it means the user wants to zap all the pages (including COWed pages), then we
need to look into swap entries because there can be private COWed pages that
was swapped out.

Skipping some swap entries when details is non-NULL may lead to wrongly leaving
some of the swap entries while we should have zapped them.

A reproducer of the problem:

===8<===
        #define _GNU_SOURCE         /* See feature_test_macros(7) */
        #include <stdio.h>
        #include <assert.h>
        #include <unistd.h>
        #include <sys/mman.h>
        #include <sys/types.h>

        int page_size;
        int shmem_fd;
        char *buffer;

        void main(void)
        {
                int ret;
                char val;

                page_size = getpagesize();
                shmem_fd = memfd_create("test", 0);
                assert(shmem_fd >= 0);

                ret = ftruncate(shmem_fd, page_size * 2);
                assert(ret == 0);

                buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
                                MAP_PRIVATE, shmem_fd, 0);
                assert(buffer != MAP_FAILED);

                /* Write private page, swap it out */
                buffer[page_size] = 1;
                madvise(buffer, page_size * 2, MADV_PAGEOUT);

                /* This should drop private buffer[page_size] already */
                ret = ftruncate(shmem_fd, page_size);
                assert(ret == 0);
                /* Recover the size */
                ret = ftruncate(shmem_fd, page_size * 2);
                assert(ret == 0);

                /* Re-read the data, it should be all zero */
                val = buffer[page_size];
                if (val == 0)
                        printf("Good\n");
                else
                        printf("BUG\n");
        }
===8<===

We don't need to touch up the pmd path, because pmd never had a issue with swap
entries.  For example, shmem pmd migration will always be split into pte level,
and same to swapping on anonymous.

Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.

This patch drops that trick, so we handle swap ptes coherently.  Meanwhile we
should do the same check upon migration entry, hwpoison entry and genuine swap
entries too.  To be explicit, we should still remember to keep the private
entries if even_cows==false, and always zap them when even_cows==true.

The issue seems to exist starting from the initial commit of git.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..4bfeaca7cbc7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1313,6 +1313,17 @@ struct zap_details {
 	struct folio *single_folio;	/* Locked folio to be unmapped */
 };
 
+/* Whether we should zap all COWed (private) pages too */
+static inline bool should_zap_cows(struct zap_details *details)
+{
+	/* By default, zap all pages */
+	if (!details)
+		return true;
+
+	/* Or, we zap COWed pages only if the caller wants to */
+	return !details->zap_mapping;
+}
+
 /*
  * We set details->zap_mapping when we want to unmap shared but keep private
  * pages. Return true if skip zapping this page, false otherwise.
@@ -1320,11 +1331,15 @@ struct zap_details {
 static inline bool
 zap_skip_check_mapping(struct zap_details *details, struct page *page)
 {
-	if (!details || !page)
+	/* If we can make a decision without *page.. */
+	if (should_zap_cows(details))
 		return false;
 
-	return details->zap_mapping &&
-		(details->zap_mapping != page_rmapping(page));
+	/* E.g. zero page */
+	if (!page)
+		return false;
+
+	return details->zap_mapping != page_rmapping(page);
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			continue;
 		}
 
-		/* If details->check_mapping, we leave swap entries. */
-		if (unlikely(details))
-			continue;
-
-		if (!non_swap_entry(entry))
+		if (!non_swap_entry(entry)) {
+			/*
+			 * If this is a genuine swap entry, then it must be an
+			 * private anon page.  If the caller wants to skip
+			 * COWed pages, ignore it.
+			 */
+			if (!should_zap_cows(details))
+				continue;
 			rss[MM_SWAPENTS]--;
-		else if (is_migration_entry(entry)) {
+		} else if (is_migration_entry(entry)) {
 			struct page *page;
 
 			page = pfn_swap_entry_to_page(entry);
+			if (zap_skip_check_mapping(details, page))
+				continue;
 			rss[mm_counter(page)]--;
+		} else if (is_hwpoison_entry(entry)) {
+			/* If the caller wants to skip COWed pages, ignore it */
+			if (!should_zap_cows(details))
+				continue;
+		} else {
+			/* We should have covered all the swap entry types */
+			WARN_ON_ONCE(1);
 		}
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page()
  2022-01-28  4:54 [PATCH v3 0/4] mm: Rework zap ptes on swap entries Peter Xu
  2022-01-28  4:54 ` [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Peter Xu
@ 2022-01-28  4:54 ` Peter Xu
  2022-01-28  8:16   ` David Hildenbrand
  2022-01-28  4:54 ` [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows Peter Xu
  2022-01-28  4:54 ` [PATCH v3 4/4] mm: Rework swap handling of zap_pte_range Peter Xu
  3 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-01-28  4:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Alistair Popple, Andrew Morton, Andrea Arcangeli,
	David Hildenbrand, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

The previous name is against the natural way people think.  Invert the meaning
and also the return value.  No functional change intended.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 4bfeaca7cbc7..14d8428ff4db 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1326,20 +1326,19 @@ static inline bool should_zap_cows(struct zap_details *details)
 
 /*
  * We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if skip zapping this page, false otherwise.
+ * pages. Return true if we should zap this page, false otherwise.
  */
-static inline bool
-zap_skip_check_mapping(struct zap_details *details, struct page *page)
+static inline bool should_zap_page(struct zap_details *details, struct page *page)
 {
 	/* If we can make a decision without *page.. */
 	if (should_zap_cows(details))
-		return false;
+		return true;
 
 	/* E.g. zero page */
 	if (!page)
-		return false;
+		return true;
 
-	return details->zap_mapping != page_rmapping(page);
+	return details->zap_mapping == page_rmapping(page);
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1374,7 +1373,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = vm_normal_page(vma, addr, ptent);
-			if (unlikely(zap_skip_check_mapping(details, page)))
+			if (unlikely(!should_zap_page(details, page)))
 				continue;
 			ptent = ptep_get_and_clear_full(mm, addr, pte,
 							tlb->fullmm);
@@ -1408,7 +1407,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		    is_device_exclusive_entry(entry)) {
 			struct page *page = pfn_swap_entry_to_page(entry);
 
-			if (unlikely(zap_skip_check_mapping(details, page)))
+			if (unlikely(!should_zap_page(details, page)))
 				continue;
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
@@ -1433,7 +1432,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = pfn_swap_entry_to_page(entry);
-			if (zap_skip_check_mapping(details, page))
+			if (!should_zap_page(details, page))
 				continue;
 			rss[mm_counter(page)]--;
 		} else if (is_hwpoison_entry(entry)) {
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows
  2022-01-28  4:54 [PATCH v3 0/4] mm: Rework zap ptes on swap entries Peter Xu
  2022-01-28  4:54 ` [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Peter Xu
  2022-01-28  4:54 ` [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page() Peter Xu
@ 2022-01-28  4:54 ` Peter Xu
  2022-01-28  9:03   ` David Hildenbrand
  2022-01-28  4:54 ` [PATCH v3 4/4] mm: Rework swap handling of zap_pte_range Peter Xu
  3 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-01-28  4:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Alistair Popple, Andrew Morton, Andrea Arcangeli,
	David Hildenbrand, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

Currently we have a zap_mapping pointer maintained in zap_details, when it is
specified we only want to zap the pages that has the same mapping with what the
caller has specified.

But what we want to do is actually simpler: we want to skip zapping
private (COW-ed) pages in some cases.  We can refer to unmap_mapping_pages()
callers where we could have passed in different even_cows values.  The other
user is unmap_mapping_folio() where we always want to skip private pages.

According to Hugh, we used a mapping pointer for historical reason, as
explained here:

  https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/

Quotting partly from Hugh:

  Which raises the question again of why I did not just use a boolean flag
  there originally: aah, I think I've found why.  In those days there was a
  horrible "optimization", for better performance on some benchmark I guess,
  which when you read from /dev/zero into a private mapping, would map the zero
  page there (look up read_zero_pagealigned() and zeromap_page_range() if you
  dare).  So there was another category of page to be skipped along with the
  anon COWs, and I didn't want multiple tests in the zap loop, so checking
  check_mapping against page->mapping did both.  I think nowadays you could do
  it by checking for PageAnon page (or genuine swap entry) instead.

This patch replaced the zap_details.zap_mapping pointer into the even_cows
boolean, then we check it against PageAnon.

Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 14d8428ff4db..ffa8c7dfe9ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
  * Parameter block passed down to zap_pte_range in exceptional cases.
  */
 struct zap_details {
-	struct address_space *zap_mapping;	/* Check page->mapping if set */
 	struct folio *single_folio;	/* Locked folio to be unmapped */
+	bool even_cows;			/* Zap COWed private pages too? */
 };
 
 /* Whether we should zap all COWed (private) pages too */
@@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struct zap_details *details)
 		return true;
 
 	/* Or, we zap COWed pages only if the caller wants to */
-	return !details->zap_mapping;
+	return details->even_cows;
 }
 
-/*
- * We set details->zap_mapping when we want to unmap shared but keep private
- * pages. Return true if we should zap this page, false otherwise.
- */
+/* Decides whether we should zap this page with the page pointer specified */
 static inline bool should_zap_page(struct zap_details *details, struct page *page)
 {
 	/* If we can make a decision without *page.. */
@@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struct zap_details *details, struct page *pag
 	if (!page)
 		return true;
 
-	return details->zap_mapping == page_rmapping(page);
+	/* Otherwise we should only zap non-anon pages */
+	return !PageAnon(page);
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -3403,7 +3401,7 @@ void unmap_mapping_folio(struct folio *folio)
 	first_index = folio->index;
 	last_index = folio->index + folio_nr_pages(folio) - 1;
 
-	details.zap_mapping = mapping;
+	details.even_cows = false;
 	details.single_folio = folio;
 
 	i_mmap_lock_write(mapping);
@@ -3432,7 +3430,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
 	pgoff_t	first_index = start;
 	pgoff_t	last_index = start + nr - 1;
 
-	details.zap_mapping = even_cows ? NULL : mapping;
+	details.even_cows = even_cows;
 	if (last_index < first_index)
 		last_index = ULONG_MAX;
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/4] mm: Rework swap handling of zap_pte_range
  2022-01-28  4:54 [PATCH v3 0/4] mm: Rework zap ptes on swap entries Peter Xu
                   ` (2 preceding siblings ...)
  2022-01-28  4:54 ` [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows Peter Xu
@ 2022-01-28  4:54 ` Peter Xu
  3 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-01-28  4:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: peterx, Alistair Popple, Andrew Morton, Andrea Arcangeli,
	David Hildenbrand, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

Clean the code up by merging the device private/exclusive swap entry handling
with the rest, then we merge the pte clear operation too.

struct* page is defined in multiple places in the function, move it upward.

free_swap_and_cache() is only useful for !non_swap_entry() case, put it into
the condition.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/memory.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ffa8c7dfe9ad..cade96024349 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1361,6 +1361,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	arch_enter_lazy_mmu_mode();
 	do {
 		pte_t ptent = *pte;
+		struct page *page;
+
 		if (pte_none(ptent))
 			continue;
 
@@ -1368,8 +1370,6 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			break;
 
 		if (pte_present(ptent)) {
-			struct page *page;
-
 			page = vm_normal_page(vma, addr, ptent);
 			if (unlikely(!should_zap_page(details, page)))
 				continue;
@@ -1403,21 +1403,14 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 		entry = pte_to_swp_entry(ptent);
 		if (is_device_private_entry(entry) ||
 		    is_device_exclusive_entry(entry)) {
-			struct page *page = pfn_swap_entry_to_page(entry);
-
+			page = pfn_swap_entry_to_page(entry);
 			if (unlikely(!should_zap_page(details, page)))
 				continue;
-			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
-
 			if (is_device_private_entry(entry))
 				page_remove_rmap(page, false);
-
 			put_page(page);
-			continue;
-		}
-
-		if (!non_swap_entry(entry)) {
+		} else if (!non_swap_entry(entry)) {
 			/*
 			 * If this is a genuine swap entry, then it must be an
 			 * private anon page.  If the caller wants to skip
@@ -1426,9 +1419,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			if (!should_zap_cows(details))
 				continue;
 			rss[MM_SWAPENTS]--;
+			if (unlikely(!free_swap_and_cache(entry)))
+				print_bad_pte(vma, addr, ptent, NULL);
 		} else if (is_migration_entry(entry)) {
-			struct page *page;
-
 			page = pfn_swap_entry_to_page(entry);
 			if (!should_zap_page(details, page))
 				continue;
@@ -1441,8 +1434,6 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			/* We should have covered all the swap entry types */
 			WARN_ON_ONCE(1);
 		}
-		if (unlikely(!free_swap_and_cache(entry)))
-			print_bad_pte(vma, addr, ptent, NULL);
 		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page()
  2022-01-28  4:54 ` [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page() Peter Xu
@ 2022-01-28  8:16   ` David Hildenbrand
  2022-01-28  8:53     ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-01-28  8:16 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Alistair Popple, Andrew Morton, Andrea Arcangeli, Matthew Wilcox,
	John Hubbard, Hugh Dickins, Vlastimil Babka, Yang Shi,
	Kirill A . Shutemov

On 28.01.22 05:54, Peter Xu wrote:
> The previous name is against the natural way people think.  Invert the meaning
> and also the return value.  No functional change intended.
> 
> Suggested-by: Hugh Dickins <hughd@google.com>

Could have sworn it was me :P

> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page()
  2022-01-28  8:16   ` David Hildenbrand
@ 2022-01-28  8:53     ` Peter Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-01-28  8:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Alistair Popple, Andrew Morton,
	Andrea Arcangeli, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

On Fri, Jan 28, 2022 at 09:16:07AM +0100, David Hildenbrand wrote:
> On 28.01.22 05:54, Peter Xu wrote:
> > The previous name is against the natural way people think.  Invert the meaning
> > and also the return value.  No functional change intended.
> > 
> > Suggested-by: Hugh Dickins <hughd@google.com>
> 
> Could have sworn it was me :P

Yeah it's possible. :)

I'll add both of you in the next version if there is.

> 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows
  2022-01-28  4:54 ` [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows Peter Xu
@ 2022-01-28  9:03   ` David Hildenbrand
  2022-01-28  9:17     ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-01-28  9:03 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Alistair Popple, Andrew Morton, Andrea Arcangeli, Matthew Wilcox,
	John Hubbard, Hugh Dickins, Vlastimil Babka, Yang Shi,
	Kirill A . Shutemov

On 28.01.22 05:54, Peter Xu wrote:
> Currently we have a zap_mapping pointer maintained in zap_details, when it is
> specified we only want to zap the pages that has the same mapping with what the
> caller has specified.
> 
> But what we want to do is actually simpler: we want to skip zapping
> private (COW-ed) pages in some cases.  We can refer to unmap_mapping_pages()
> callers where we could have passed in different even_cows values.  The other
> user is unmap_mapping_folio() where we always want to skip private pages.
> 
> According to Hugh, we used a mapping pointer for historical reason, as
> explained here:
> 
>   https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/
> 
> Quotting partly from Hugh:

s/Quotting/Quoting/

> 
>   Which raises the question again of why I did not just use a boolean flag
>   there originally: aah, I think I've found why.  In those days there was a
>   horrible "optimization", for better performance on some benchmark I guess,
>   which when you read from /dev/zero into a private mapping, would map the zero
>   page there (look up read_zero_pagealigned() and zeromap_page_range() if you
>   dare).  So there was another category of page to be skipped along with the
>   anon COWs, and I didn't want multiple tests in the zap loop, so checking
>   check_mapping against page->mapping did both.  I think nowadays you could do
>   it by checking for PageAnon page (or genuine swap entry) instead.
> 
> This patch replaced the zap_details.zap_mapping pointer into the even_cows
> boolean, then we check it against PageAnon.
> 
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/memory.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 14d8428ff4db..ffa8c7dfe9ad 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>   * Parameter block passed down to zap_pte_range in exceptional cases.
>   */
>  struct zap_details {
> -	struct address_space *zap_mapping;	/* Check page->mapping if set */
>  	struct folio *single_folio;	/* Locked folio to be unmapped */
> +	bool even_cows;			/* Zap COWed private pages too? */
>  };
>  
>  /* Whether we should zap all COWed (private) pages too */
> @@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struct zap_details *details)
>  		return true;
>  
>  	/* Or, we zap COWed pages only if the caller wants to */
> -	return !details->zap_mapping;
> +	return details->even_cows;
>  }
>  
> -/*
> - * We set details->zap_mapping when we want to unmap shared but keep private
> - * pages. Return true if we should zap this page, false otherwise.
> - */
> +/* Decides whether we should zap this page with the page pointer specified */
>  static inline bool should_zap_page(struct zap_details *details, struct page *page)
>  {
>  	/* If we can make a decision without *page.. */
> @@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struct zap_details *details, struct page *pag
>  	if (!page)
>  		return true;
>  
> -	return details->zap_mapping == page_rmapping(page);
> +	/* Otherwise we should only zap non-anon pages */
> +	return !PageAnon(page);
>  }
>  
>  static unsigned long zap_pte_range(struct mmu_gather *tlb,
> @@ -3403,7 +3401,7 @@ void unmap_mapping_folio(struct folio *folio)
>  	first_index = folio->index;
>  	last_index = folio->index + folio_nr_pages(folio) - 1;
>  
> -	details.zap_mapping = mapping;
> +	details.even_cows = false;

Already initialized to 0 via struct zap_details details = { };

We could think about

struct zap_details details = {
	.single_folio = folio,
};

>  	details.single_folio = folio;
>  
>  	i_mmap_lock_write(mapping);
> @@ -3432,7 +3430,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
>  	pgoff_t	first_index = start;
>  	pgoff_t	last_index = start + nr - 1;
>  
> -	details.zap_mapping = even_cows ? NULL : mapping;
> +	details.even_cows = even_cows;
>  	if (last_index < first_index)
>  		last_index = ULONG_MAX;
>  

Eventually

struct zap_details details = {
	.even_cows = even_cows,
};

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows
  2022-01-28  9:03   ` David Hildenbrand
@ 2022-01-28  9:17     ` Peter Xu
  2022-01-28  9:18       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-01-28  9:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Alistair Popple, Andrew Morton,
	Andrea Arcangeli, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

On Fri, Jan 28, 2022 at 10:03:20AM +0100, David Hildenbrand wrote:
> On 28.01.22 05:54, Peter Xu wrote:
> > Currently we have a zap_mapping pointer maintained in zap_details, when it is
> > specified we only want to zap the pages that has the same mapping with what the
> > caller has specified.
> > 
> > But what we want to do is actually simpler: we want to skip zapping
> > private (COW-ed) pages in some cases.  We can refer to unmap_mapping_pages()
> > callers where we could have passed in different even_cows values.  The other
> > user is unmap_mapping_folio() where we always want to skip private pages.
> > 
> > According to Hugh, we used a mapping pointer for historical reason, as
> > explained here:
> > 
> >   https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/
> > 
> > Quotting partly from Hugh:
> 
> s/Quotting/Quoting/

Will fix.

> 
> > 
> >   Which raises the question again of why I did not just use a boolean flag
> >   there originally: aah, I think I've found why.  In those days there was a
> >   horrible "optimization", for better performance on some benchmark I guess,
> >   which when you read from /dev/zero into a private mapping, would map the zero
> >   page there (look up read_zero_pagealigned() and zeromap_page_range() if you
> >   dare).  So there was another category of page to be skipped along with the
> >   anon COWs, and I didn't want multiple tests in the zap loop, so checking
> >   check_mapping against page->mapping did both.  I think nowadays you could do
> >   it by checking for PageAnon page (or genuine swap entry) instead.
> > 
> > This patch replaced the zap_details.zap_mapping pointer into the even_cows
> > boolean, then we check it against PageAnon.
> > 
> > Suggested-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/memory.c | 16 +++++++---------
> >  1 file changed, 7 insertions(+), 9 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 14d8428ff4db..ffa8c7dfe9ad 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> >   * Parameter block passed down to zap_pte_range in exceptional cases.
> >   */
> >  struct zap_details {
> > -	struct address_space *zap_mapping;	/* Check page->mapping if set */
> >  	struct folio *single_folio;	/* Locked folio to be unmapped */
> > +	bool even_cows;			/* Zap COWed private pages too? */
> >  };
> >  
> >  /* Whether we should zap all COWed (private) pages too */
> > @@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struct zap_details *details)
> >  		return true;
> >  
> >  	/* Or, we zap COWed pages only if the caller wants to */
> > -	return !details->zap_mapping;
> > +	return details->even_cows;
> >  }
> >  
> > -/*
> > - * We set details->zap_mapping when we want to unmap shared but keep private
> > - * pages. Return true if we should zap this page, false otherwise.
> > - */
> > +/* Decides whether we should zap this page with the page pointer specified */
> >  static inline bool should_zap_page(struct zap_details *details, struct page *page)
> >  {
> >  	/* If we can make a decision without *page.. */
> > @@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struct zap_details *details, struct page *pag
> >  	if (!page)
> >  		return true;
> >  
> > -	return details->zap_mapping == page_rmapping(page);
> > +	/* Otherwise we should only zap non-anon pages */
> > +	return !PageAnon(page);
> >  }
> >  
> >  static unsigned long zap_pte_range(struct mmu_gather *tlb,
> > @@ -3403,7 +3401,7 @@ void unmap_mapping_folio(struct folio *folio)
> >  	first_index = folio->index;
> >  	last_index = folio->index + folio_nr_pages(folio) - 1;
> >  
> > -	details.zap_mapping = mapping;
> > +	details.even_cows = false;
> 
> Already initialized to 0 via struct zap_details details = { };
> 
> We could think about
> 
> struct zap_details details = {
> 	.single_folio = folio,
> };
> 
> >  	details.single_folio = folio;
> >  
> >  	i_mmap_lock_write(mapping);
> > @@ -3432,7 +3430,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
> >  	pgoff_t	first_index = start;
> >  	pgoff_t	last_index = start + nr - 1;
> >  
> > -	details.zap_mapping = even_cows ? NULL : mapping;
> > +	details.even_cows = even_cows;
> >  	if (last_index < first_index)
> >  		last_index = ULONG_MAX;
> >  
> 
> Eventually
> 
> struct zap_details details = {
> 	.even_cows = even_cows,
> };

I think in the very initial version I have had that C99 init format but I
dropped it for some reason, perhaps when rebasing to the single_page work to
avoid touching the existing code.

Since as you mentioned single_folio is another.. let's do the cleanup on top?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows
  2022-01-28  9:17     ` Peter Xu
@ 2022-01-28  9:18       ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2022-01-28  9:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Alistair Popple, Andrew Morton,
	Andrea Arcangeli, Matthew Wilcox, John Hubbard, Hugh Dickins,
	Vlastimil Babka, Yang Shi, Kirill A . Shutemov

On 28.01.22 10:17, Peter Xu wrote:
> On Fri, Jan 28, 2022 at 10:03:20AM +0100, David Hildenbrand wrote:
>> On 28.01.22 05:54, Peter Xu wrote:
>>> Currently we have a zap_mapping pointer maintained in zap_details, when it is
>>> specified we only want to zap the pages that has the same mapping with what the
>>> caller has specified.
>>>
>>> But what we want to do is actually simpler: we want to skip zapping
>>> private (COW-ed) pages in some cases.  We can refer to unmap_mapping_pages()
>>> callers where we could have passed in different even_cows values.  The other
>>> user is unmap_mapping_folio() where we always want to skip private pages.
>>>
>>> According to Hugh, we used a mapping pointer for historical reason, as
>>> explained here:
>>>
>>>   https://lore.kernel.org/lkml/391aa58d-ce84-9d4-d68d-d98a9c533255@google.com/
>>>
>>> Quotting partly from Hugh:
>>
>> s/Quotting/Quoting/
> 
> Will fix.
> 
>>
>>>
>>>   Which raises the question again of why I did not just use a boolean flag
>>>   there originally: aah, I think I've found why.  In those days there was a
>>>   horrible "optimization", for better performance on some benchmark I guess,
>>>   which when you read from /dev/zero into a private mapping, would map the zero
>>>   page there (look up read_zero_pagealigned() and zeromap_page_range() if you
>>>   dare).  So there was another category of page to be skipped along with the
>>>   anon COWs, and I didn't want multiple tests in the zap loop, so checking
>>>   check_mapping against page->mapping did both.  I think nowadays you could do
>>>   it by checking for PageAnon page (or genuine swap entry) instead.
>>>
>>> This patch replaced the zap_details.zap_mapping pointer into the even_cows
>>> boolean, then we check it against PageAnon.
>>>
>>> Suggested-by: Hugh Dickins <hughd@google.com>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  mm/memory.c | 16 +++++++---------
>>>  1 file changed, 7 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 14d8428ff4db..ffa8c7dfe9ad 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -1309,8 +1309,8 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>>>   * Parameter block passed down to zap_pte_range in exceptional cases.
>>>   */
>>>  struct zap_details {
>>> -	struct address_space *zap_mapping;	/* Check page->mapping if set */
>>>  	struct folio *single_folio;	/* Locked folio to be unmapped */
>>> +	bool even_cows;			/* Zap COWed private pages too? */
>>>  };
>>>  
>>>  /* Whether we should zap all COWed (private) pages too */
>>> @@ -1321,13 +1321,10 @@ static inline bool should_zap_cows(struct zap_details *details)
>>>  		return true;
>>>  
>>>  	/* Or, we zap COWed pages only if the caller wants to */
>>> -	return !details->zap_mapping;
>>> +	return details->even_cows;
>>>  }
>>>  
>>> -/*
>>> - * We set details->zap_mapping when we want to unmap shared but keep private
>>> - * pages. Return true if we should zap this page, false otherwise.
>>> - */
>>> +/* Decides whether we should zap this page with the page pointer specified */
>>>  static inline bool should_zap_page(struct zap_details *details, struct page *page)
>>>  {
>>>  	/* If we can make a decision without *page.. */
>>> @@ -1338,7 +1335,8 @@ static inline bool should_zap_page(struct zap_details *details, struct page *pag
>>>  	if (!page)
>>>  		return true;
>>>  
>>> -	return details->zap_mapping == page_rmapping(page);
>>> +	/* Otherwise we should only zap non-anon pages */
>>> +	return !PageAnon(page);
>>>  }
>>>  
>>>  static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>> @@ -3403,7 +3401,7 @@ void unmap_mapping_folio(struct folio *folio)
>>>  	first_index = folio->index;
>>>  	last_index = folio->index + folio_nr_pages(folio) - 1;
>>>  
>>> -	details.zap_mapping = mapping;
>>> +	details.even_cows = false;
>>
>> Already initialized to 0 via struct zap_details details = { };
>>
>> We could think about
>>
>> struct zap_details details = {
>> 	.single_folio = folio,
>> };
>>
>>>  	details.single_folio = folio;
>>>  
>>>  	i_mmap_lock_write(mapping);
>>> @@ -3432,7 +3430,7 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start,
>>>  	pgoff_t	first_index = start;
>>>  	pgoff_t	last_index = start + nr - 1;
>>>  
>>> -	details.zap_mapping = even_cows ? NULL : mapping;
>>> +	details.even_cows = even_cows;
>>>  	if (last_index < first_index)
>>>  		last_index = ULONG_MAX;
>>>  
>>
>> Eventually
>>
>> struct zap_details details = {
>> 	.even_cows = even_cows,
>> };
> 
> I think in the very initial version I have had that C99 init format but I
> dropped it for some reason, perhaps when rebasing to the single_page work to
> avoid touching the existing code.
> 
> Since as you mentioned single_folio is another.. let's do the cleanup on top?

Sure, why not.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-28  9:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28  4:54 [PATCH v3 0/4] mm: Rework zap ptes on swap entries Peter Xu
2022-01-28  4:54 ` [PATCH v3 1/4] mm: Don't skip swap entry even if zap_details specified Peter Xu
2022-01-28  4:54 ` [PATCH v3 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page() Peter Xu
2022-01-28  8:16   ` David Hildenbrand
2022-01-28  8:53     ` Peter Xu
2022-01-28  4:54 ` [PATCH v3 3/4] mm: Change zap_details.zap_mapping into even_cows Peter Xu
2022-01-28  9:03   ` David Hildenbrand
2022-01-28  9:17     ` Peter Xu
2022-01-28  9:18       ` David Hildenbrand
2022-01-28  4:54 ` [PATCH v3 4/4] mm: Rework swap handling of zap_pte_range Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).