All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 4/6] mm, madvise: ensure poisoned pages are removed from per-cpu lists
@ 2017-08-31 23:15 akpm
  0 siblings, 0 replies; 2+ messages in thread
From: akpm @ 2017-08-31 23:15 UTC (permalink / raw)
  To: torvalds, mm-commits, akpm, mgorman, dave.hansen, nao.horiguchi,
	rientjes, stable, tony.luck, vbabka, wendy.wang

From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, madvise: ensure poisoned pages are removed from per-cpu lists

Wendy Wang reported off-list that a RAS HWPOISON-SOFT test case failed and
bisected it to the commit 479f854a207c ("mm, page_alloc: defer debugging
checks of pages allocated from the PCP").  The problem is that a page that
was poisoned with madvise() is reused.  The commit removed a check that
would trigger if DEBUG_VM was enabled but re-enabling the check only fixes
the problem as a side-effect by printing a bad_page warning and
recovering.

The root of the problem is that an madvise() can leave a poisoned page on
the per-cpu list.  This patch drains all per-cpu lists after pages are
poisoned so that they will not be reused.  Wendy reports that the test
case in question passes with this patch applied.  While this could be done
in a targeted fashion, it is over-complicated for such a rare operation.

Link: http://lkml.kernel.org/r/20170828133414.7qro57jbepdcyz5x@techsingularity.net
Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Wang, Wendy <wendy.wang@intel.com>
Tested-by: Wang, Wendy <wendy.wang@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Hansen, Dave" <dave.hansen@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff -puN mm/madvise.c~mm-madvise-ensure-poisoned-pages-are-removed-from-per-cpu-lists mm/madvise.c
--- a/mm/madvise.c~mm-madvise-ensure-poisoned-pages-are-removed-from-per-cpu-lists
+++ a/mm/madvise.c
@@ -613,6 +613,7 @@ static int madvise_inject_error(int beha
 		unsigned long start, unsigned long end)
 {
 	struct page *page;
+	struct zone *zone;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -646,6 +647,11 @@ static int madvise_inject_error(int beha
 		if (ret)
 			return ret;
 	}
+
+	/* Ensure that all poisoned pages are removed from per-cpu lists */
+	for_each_populated_zone(zone)
+		drain_all_pages(zone);
+
 	return 0;
 }
 #endif
_

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [patch 4/6] mm, madvise: ensure poisoned pages are removed from per-cpu lists
@ 2017-08-31 23:15 akpm
  0 siblings, 0 replies; 2+ messages in thread
From: akpm @ 2017-08-31 23:15 UTC (permalink / raw)
  To: torvalds, mm-commits, akpm, mgorman, dave.hansen, nao.horiguchi,
	rientjes, stable, tony.luck, vbabka, wendy.wang

From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, madvise: ensure poisoned pages are removed from per-cpu lists

Wendy Wang reported off-list that a RAS HWPOISON-SOFT test case failed and
bisected it to the commit 479f854a207c ("mm, page_alloc: defer debugging
checks of pages allocated from the PCP").  The problem is that a page that
was poisoned with madvise() is reused.  The commit removed a check that
would trigger if DEBUG_VM was enabled but re-enabling the check only fixes
the problem as a side-effect by printing a bad_page warning and
recovering.

The root of the problem is that an madvise() can leave a poisoned page on
the per-cpu list.  This patch drains all per-cpu lists after pages are
poisoned so that they will not be reused.  Wendy reports that the test
case in question passes with this patch applied.  While this could be done
in a targeted fashion, it is over-complicated for such a rare operation.

Link: http://lkml.kernel.org/r/20170828133414.7qro57jbepdcyz5x@techsingularity.net
Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Wang, Wendy <wendy.wang@intel.com>
Tested-by: Wang, Wendy <wendy.wang@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Hansen, Dave" <dave.hansen@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff -puN mm/madvise.c~mm-madvise-ensure-poisoned-pages-are-removed-from-per-cpu-lists mm/madvise.c
--- a/mm/madvise.c~mm-madvise-ensure-poisoned-pages-are-removed-from-per-cpu-lists
+++ a/mm/madvise.c
@@ -613,6 +613,7 @@ static int madvise_inject_error(int beha
 		unsigned long start, unsigned long end)
 {
 	struct page *page;
+	struct zone *zone;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -646,6 +647,11 @@ static int madvise_inject_error(int beha
 		if (ret)
 			return ret;
 	}
+
+	/* Ensure that all poisoned pages are removed from per-cpu lists */
+	for_each_populated_zone(zone)
+		drain_all_pages(zone);
+
 	return 0;
 }
 #endif
_

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-08-31 23:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31 23:15 [patch 4/6] mm, madvise: ensure poisoned pages are removed from per-cpu lists akpm
  -- strict thread matches above, loose matches on Subject: below --
2017-08-31 23:15 akpm

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.