linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	Yang Shi <shy828301@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v4 1/4] mm: Don't skip swap entry even if zap_details specified
Date: Wed, 16 Feb 2022 19:15:30 -0800	[thread overview]
Message-ID: <da98141f-c8d8-f796-d8ec-352d9b2abd6a@nvidia.com> (raw)
In-Reply-To: <20220216094810.60572-2-peterx@redhat.com>

On 2/16/22 1:48 AM, Peter Xu wrote:
> The "details" pointer shouldn't be the token to decide whether we should skip
> swap entries.  For example, when the user specified details->zap_mapping==NULL,
> it means the user wants to zap all the pages (including COWed pages), then we
> need to look into swap entries because there can be private COWed pages that
> was swapped out.

Hi Peter,

The changes look good, just some minor readability suggestions below:

(btw, hch is going to ask you to reflow all of the commit descriptions
to 72 cols, so you might as well do it in advance. :)

> 
> Skipping some swap entries when details is non-NULL may lead to wrongly leaving
> some of the swap entries while we should have zapped them.
> 
> A reproducer of the problem:
> 
> ===8<===
>         #define _GNU_SOURCE         /* See feature_test_macros(7) */
>         #include <stdio.h>
>         #include <assert.h>
>         #include <unistd.h>
>         #include <sys/mman.h>
>         #include <sys/types.h>
> 
>         int page_size;
>         int shmem_fd;
>         char *buffer;
> 
>         void main(void)
>         {
>                 int ret;
>                 char val;
> 
>                 page_size = getpagesize();
>                 shmem_fd = memfd_create("test", 0);
>                 assert(shmem_fd >= 0);
> 
>                 ret = ftruncate(shmem_fd, page_size * 2);
>                 assert(ret == 0);
> 
>                 buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
>                                 MAP_PRIVATE, shmem_fd, 0);
>                 assert(buffer != MAP_FAILED);
> 
>                 /* Write private page, swap it out */
>                 buffer[page_size] = 1;
>                 madvise(buffer, page_size * 2, MADV_PAGEOUT);
> 
>                 /* This should drop private buffer[page_size] already */
>                 ret = ftruncate(shmem_fd, page_size);
>                 assert(ret == 0);
>                 /* Recover the size */
>                 ret = ftruncate(shmem_fd, page_size * 2);
>                 assert(ret == 0);
> 
>                 /* Re-read the data, it should be all zero */
>                 val = buffer[page_size];
>                 if (val == 0)
>                         printf("Good\n");
>                 else
>                         printf("BUG\n");
>         }
> ===8<===
> 
> We don't need to touch up the pmd path, because pmd never had a issue with swap
> entries.  For example, shmem pmd migration will always be split into pte level,
> and same to swapping on anonymous.
> 
> Add another helper should_zap_cows() so that we can also check whether we
> should zap private mappings when there's no page pointer specified.
> 
> This patch drops that trick, so we handle swap ptes coherently.  Meanwhile we
> should do the same check upon migration entry, hwpoison entry and genuine swap
> entries too.  To be explicit, we should still remember to keep the private
> entries if even_cows==false, and always zap them when even_cows==true.
> 
> The issue seems to exist starting from the initial commit of git.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 36 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index c125c4969913..4bfeaca7cbc7 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1313,6 +1313,17 @@ struct zap_details {
>  	struct folio *single_folio;	/* Locked folio to be unmapped */
>  };
>  
> +/* Whether we should zap all COWed (private) pages too */
> +static inline bool should_zap_cows(struct zap_details *details)
> +{
> +	/* By default, zap all pages */
> +	if (!details)
> +		return true;
> +
> +	/* Or, we zap COWed pages only if the caller wants to */
> +	return !details->zap_mapping;
> +}
> +
>  /*
>   * We set details->zap_mapping when we want to unmap shared but keep private
>   * pages. Return true if skip zapping this page, false otherwise.
> @@ -1320,11 +1331,15 @@ struct zap_details {
>  static inline bool
>  zap_skip_check_mapping(struct zap_details *details, struct page *page)
>  {
> -	if (!details || !page)
> +	/* If we can make a decision without *page.. */
> +	if (should_zap_cows(details))
>  		return false;
>  
> -	return details->zap_mapping &&
> -		(details->zap_mapping != page_rmapping(page));
> +	/* E.g. zero page */

It's a bit confusing to see a comment that "this could be the zero page", if 
the value is NULL. Maybe, "the caller passes NULL for the case of a zero 
page", or something along those lines? 


> +	if (!page)
> +		return false;
> +
> +	return details->zap_mapping != page_rmapping(page);
>  }
>  
>  static unsigned long zap_pte_range(struct mmu_gather *tlb,
> @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			continue;
>  		}
>  
> -		/* If details->check_mapping, we leave swap entries. */
> -		if (unlikely(details))
> -			continue;
> -
> -		if (!non_swap_entry(entry))
> +		if (!non_swap_entry(entry)) {
> +			/*
> +			 * If this is a genuine swap entry, then it must be an
> +			 * private anon page.  If the caller wants to skip
> +			 * COWed pages, ignore it.
> +			 */

How about this instead:

			/* Genuine swap entry, and therefore a private anon page. */

> +			if (!should_zap_cows(details))
> +				continue;
>  			rss[MM_SWAPENTS]--;
> -		else if (is_migration_entry(entry)) {

Can we put a newline here, and before each "else" block? Because now it
is getting very dense, and the visual separation really helps.

> +		} else if (is_migration_entry(entry)) {
>  			struct page *page;
>  
>  			page = pfn_swap_entry_to_page(entry);
> +			if (zap_skip_check_mapping(details, page))
> +				continue;
>  			rss[mm_counter(page)]--;

Newline here.

> +		} else if (is_hwpoison_entry(entry)) {
> +			/* If the caller wants to skip COWed pages, ignore it */

Likewise, I'd prefer we delete that comment, because it exactly matches 
what the following line of code says.

> +			if (!should_zap_cows(details))
> +				continue;

And newline here too.

> +		} else {
> +			/* We should have covered all the swap entry types */
> +			WARN_ON_ONCE(1);
>  		}
>  		if (unlikely(!free_swap_and_cache(entry)))
>  			print_bad_pte(vma, addr, ptent, NULL);

Those are all just nits, and as I mentioned, the actual changes look good
to me, so:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA


  parent reply	other threads:[~2022-02-17  3:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16  9:48 [PATCH v4 0/4] mm: Rework zap ptes on swap entries Peter Xu
2022-02-16  9:48 ` [PATCH v4 1/4] mm: Don't skip swap entry even if zap_details specified Peter Xu
2022-02-17  2:54   ` Andrew Morton
2022-02-17  4:22     ` Peter Xu
2022-02-17  3:15   ` John Hubbard [this message]
2022-02-17  4:46     ` Peter Xu
2022-02-17  5:45       ` John Hubbard
2022-02-16  9:48 ` [PATCH v4 2/4] mm: Rename zap_skip_check_mapping() to should_zap_page() Peter Xu
2022-02-17  3:18   ` John Hubbard
2022-02-16  9:48 ` [PATCH v4 3/4] mm: Change zap_details.zap_mapping into even_cows Peter Xu
2022-02-17  3:19   ` John Hubbard
2022-02-16  9:48 ` [PATCH v4 4/4] mm: Rework swap handling of zap_pte_range Peter Xu
2022-02-17  3:25   ` John Hubbard
2022-02-17  4:54     ` Peter Xu
2022-02-17  5:46       ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da98141f-c8d8-f796-d8ec-352d9b2abd6a@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).