All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	mhocko@suse.com, jannh@google.com, vbabka@suse.cz,
	minchan@kernel.org, dancol@google.com, joel@joelfernandes.org,
	akpm@linux-foundation.org
Subject: [PATCH 2/2] mm/madvise: skip MADV_PAGEOUT on shared swap cache pages
Date: Mon, 23 Mar 2020 16:41:51 -0700	[thread overview]
Message-ID: <20200323234151.10AF5617@viggo.jf.intel.com> (raw)
In-Reply-To: <20200323234147.558EBA81@viggo.jf.intel.com>


From: Dave Hansen <dave.hansen@linux.intel.com>

MADV_PAGEOUT might interfere with other processes if it is
allowed to reclaim pages shared with other processses.  A
previous patch tried to avoid this for anonymous pages
which were shared by a fork().  It did this by checking
page_mapcount().

That works great for mapped pages.  But, it can not detect
unmapped swap cache pages.  This has not been a problem,
until the previous patch which added the ability for
MADV_PAGEOUT to *find* swap cache pages.

A process doing MADV_PAGEOUT which finds an unmapped swap
cache page and evicts it might interfere with another process
which had the same page mapped.  But, such a page would have
a page_mapcount() of 1 since the page is only actually mapped
in the *other* process.  The page_mapcount() test would fail
to detect the situation.

Thankfully, there is a reference count for swap entries.
To fix this, simply consult both page_mapcount() and the swap
reference count via page_swapcount().

I rigged up a little test program to try to create these
situations.  Basically, if the parent "reader" RSS changes
in response to MADV_PAGEOUT actions in the child, there is
a problem.

	https://www.sr71.net/~dave/intel/madv-pageout.c

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

 b/mm/madvise.c |   37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff -puN mm/madvise.c~madv-pageout-ignore-shared-swap-cache mm/madvise.c
--- a/mm/madvise.c~madv-pageout-ignore-shared-swap-cache	2020-03-23 16:30:52.022385888 -0700
+++ b/mm/madvise.c	2020-03-23 16:41:15.448384333 -0700
@@ -261,6 +261,7 @@ static struct page *pte_get_reclaim_page
 {
 	swp_entry_t entry;
 	struct page *page;
+	int nr_page_references = 0;
 
 	/* Totally empty PTE: */
 	if (pte_none(ptent))
@@ -271,7 +272,7 @@ static struct page *pte_get_reclaim_page
 		page = vm_normal_page(vma, addr, ptent);
 		if (page)
 			get_page(page);
-		return page;
+		goto got_page;
 	}
 
 	/*
@@ -292,7 +293,33 @@ static struct page *pte_get_reclaim_page
 	 * The PTE was a true swap entry.  The page may be in
 	 * the swap cache.
 	 */
-	return lookup_swap_cache(entry, vma, addr);
+	page = lookup_swap_cache(entry, vma, addr);
+	if (!page)
+		return NULL;
+got_page:
+	/*
+	 * Account for references to the swap entry.  These
+	 * might be "upgraded" to a normal mapping at any
+	 * time.
+	 */
+	if (PageSwapCache(page))
+		nr_page_references += page_swapcount(page);
+
+	/*
+	 * Account for all mappings of the page, including
+	 * when it is in the swap cache.  This ensures that
+	 * MADV_PAGOUT not interfere with anything shared
+	 * with another process.
+	 */
+	nr_page_references += page_mapcount(page);
+
+	/* Any extra references?  Do not reclaim it. */
+	if (nr_page_references > 1) {
+		put_page(page);
+		return NULL;
+	}
+
+	return page;
 }
 
 /*
@@ -477,12 +504,6 @@ regular_page:
 			continue;
 		}
 
-		/* Do not interfere with other mappings of this page */
-		if (page_mapcount(page) != 1) {
-			put_page(page);
-			continue;
-		}
-
 		VM_BUG_ON_PAGE(PageTransCompound(page), page);
 
 		if (!is_swap_pte(ptent) && pte_young(ptent)) {
_

  parent reply	other threads:[~2020-03-23 23:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-23 23:41 [PATCH 0/2] mm/madvise: teach MADV_PAGEOUT about swap cache Dave Hansen
2020-03-23 23:41 ` [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache pages Dave Hansen
2020-03-26  6:24   ` Minchan Kim
2020-03-23 23:41 ` Dave Hansen [this message]
2020-03-26  6:28   ` [PATCH 2/2] mm/madvise: skip MADV_PAGEOUT on shared " Minchan Kim
2020-03-26 23:00     ` Dave Hansen
2020-03-27  6:42       ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200323234151.10AF5617@viggo.jf.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dancol@google.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.