* [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page @ 2016-05-17 7:42 Naoya Horiguchi 2016-05-18 7:30 ` Vlastimil Babka 2016-05-18 9:21 ` Mel Gorman 0 siblings, 2 replies; 9+ messages in thread From: Naoya Horiguchi @ 2016-05-17 7:42 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, Mel Gorman, linux-mm, linux-kernel, Naoya Horiguchi, Naoya Horiguchi There's a race window between checking page->flags and unpoisoning, which taints kernel with "BUG: Bad page state". That's overkill. It's safer to use bad_flags to detect hwpoisoned page. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git tmp/mm/page_alloc.c tmp_patched/mm/page_alloc.c index 5b269bc..4e0fa37 100644 --- tmp/mm/page_alloc.c +++ tmp_patched/mm/page_alloc.c @@ -522,8 +522,8 @@ static void bad_page(struct page *page, const char *reason, static unsigned long nr_shown; static unsigned long nr_unshown; - /* Don't complain about poisoned pages */ - if (PageHWPoison(page)) { + /* Don't complain about hwpoisoned pages */ + if (bad_flags == __PG_HWPOISON) { page_mapcount_reset(page); /* remove PageBuddy */ return; } -- 2.7.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page 2016-05-17 7:42 [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page Naoya Horiguchi @ 2016-05-18 7:30 ` Vlastimil Babka 2016-05-18 9:21 ` Mel Gorman 1 sibling, 0 replies; 9+ messages in thread From: Vlastimil Babka @ 2016-05-18 7:30 UTC (permalink / raw) To: Naoya Horiguchi, Andrew Morton Cc: Mel Gorman, linux-mm, linux-kernel, Naoya Horiguchi On 05/17/2016 09:42 AM, Naoya Horiguchi wrote: > There's a race window between checking page->flags and unpoisoning, which > taints kernel with "BUG: Bad page state". That's overkill. It's safer to > use bad_flags to detect hwpoisoned page. > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > --- > mm/page_alloc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git tmp/mm/page_alloc.c tmp_patched/mm/page_alloc.c > index 5b269bc..4e0fa37 100644 > --- tmp/mm/page_alloc.c > +++ tmp_patched/mm/page_alloc.c > @@ -522,8 +522,8 @@ static void bad_page(struct page *page, const char *reason, > static unsigned long nr_shown; > static unsigned long nr_unshown; > > - /* Don't complain about poisoned pages */ > - if (PageHWPoison(page)) { > + /* Don't complain about hwpoisoned pages */ > + if (bad_flags == __PG_HWPOISON) { This will wrongly return prematurely on !CONFIG_MEMORY_FAILURE where __PG_HWPOISON == 0 and bad_page() called for other reasons than bad flags? > page_mapcount_reset(page); /* remove PageBuddy */ > return; > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page 2016-05-17 7:42 [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page Naoya Horiguchi 2016-05-18 7:30 ` Vlastimil Babka @ 2016-05-18 9:21 ` Mel Gorman 2016-05-18 9:31 ` Vlastimil Babka 1 sibling, 1 reply; 9+ messages in thread From: Mel Gorman @ 2016-05-18 9:21 UTC (permalink / raw) To: Naoya Horiguchi Cc: Andrew Morton, Vlastimil Babka, linux-mm, linux-kernel, Naoya Horiguchi On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: > There's a race window between checking page->flags and unpoisoning, which > taints kernel with "BUG: Bad page state". That's overkill. It's safer to > use bad_flags to detect hwpoisoned page. > I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it should have been (bad_flags & __PG_POISON). As Vlastimil already pointed out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the race. The current race is 1. Check poison, set bad_flags 2. poison clears in parallel 3. Check page->flag state in bad_page and trigger warning The code changes it to 1. Check poison, set bad_flags 2. poison clears in parallel 3. Check bad_flags and trigger warning There is warning either way. What did I miss? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page 2016-05-18 9:21 ` Mel Gorman @ 2016-05-18 9:31 ` Vlastimil Babka 2016-05-18 9:52 ` Mel Gorman 2016-05-18 10:09 ` [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case Naoya Horiguchi 0 siblings, 2 replies; 9+ messages in thread From: Vlastimil Babka @ 2016-05-18 9:31 UTC (permalink / raw) To: Mel Gorman, Naoya Horiguchi Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On 05/18/2016 11:21 AM, Mel Gorman wrote: > On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: >> There's a race window between checking page->flags and unpoisoning, which >> taints kernel with "BUG: Bad page state". That's overkill. It's safer to >> use bad_flags to detect hwpoisoned page. >> > > I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it > should have been (bad_flags & __PG_POISON). As Vlastimil already pointed > out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the > race. The current race is > > 1. Check poison, set bad_flags > 2. poison clears in parallel > 3. Check page->flag state in bad_page and trigger warning > > The code changes it to > > 1. Check poison, set bad_flags > 2. poison clears in parallel > 3. Check bad_flags and trigger warning I think you got step 3 here wrong. It's "skip the warning since we have set bad_flags to hwpoison and bad_flags didn't change due to parallel unpoison". Perhaps the question is why do we need to split the handling between check_new_page_bad() and bad_page() like this? It might have been different in the past, but seems like at this point we only look for hwpoison from check_new_page_bad(). But a cleanup can come later. > There is warning either way. What did I miss? > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page 2016-05-18 9:31 ` Vlastimil Babka @ 2016-05-18 9:52 ` Mel Gorman 2016-05-18 10:17 ` Naoya Horiguchi 2016-05-18 10:09 ` [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case Naoya Horiguchi 1 sibling, 1 reply; 9+ messages in thread From: Mel Gorman @ 2016-05-18 9:52 UTC (permalink / raw) To: Vlastimil Babka Cc: Naoya Horiguchi, Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On Wed, May 18, 2016 at 11:31:07AM +0200, Vlastimil Babka wrote: > On 05/18/2016 11:21 AM, Mel Gorman wrote: > >On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: > >>There's a race window between checking page->flags and unpoisoning, which > >>taints kernel with "BUG: Bad page state". That's overkill. It's safer to > >>use bad_flags to detect hwpoisoned page. > >> > > > >I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it > >should have been (bad_flags & __PG_POISON). As Vlastimil already pointed > >out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the > >race. The current race is > > > >1. Check poison, set bad_flags > >2. poison clears in parallel > >3. Check page->flag state in bad_page and trigger warning > > > >The code changes it to > > > >1. Check poison, set bad_flags > >2. poison clears in parallel > >3. Check bad_flags and trigger warning > > I think you got step 3 here wrong. It's "skip the warning since we have set > bad_flags to hwpoison and bad_flags didn't change due to parallel unpoison". > I think the benefit is marginal. The race means that the patch will trigger a warning that might have been missed before due to a parallel unpoison but that's not necessary a Good Thing. It's inherently race-prone. Naoya, if you fix the check to (bad_flags & __PG_POISON) then I'll add my ack but I'm not convinced it's a real problem. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page 2016-05-18 9:52 ` Mel Gorman @ 2016-05-18 10:17 ` Naoya Horiguchi 0 siblings, 0 replies; 9+ messages in thread From: Naoya Horiguchi @ 2016-05-18 10:17 UTC (permalink / raw) To: Mel Gorman Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On Wed, May 18, 2016 at 10:52:51AM +0100, Mel Gorman wrote: > On Wed, May 18, 2016 at 11:31:07AM +0200, Vlastimil Babka wrote: > > On 05/18/2016 11:21 AM, Mel Gorman wrote: > > >On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: > > >>There's a race window between checking page->flags and unpoisoning, which > > >>taints kernel with "BUG: Bad page state". That's overkill. It's safer to > > >>use bad_flags to detect hwpoisoned page. > > >> > > > > > >I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it > > >should have been (bad_flags & __PG_POISON). As Vlastimil already pointed > > >out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the > > >race. The current race is > > > > > >1. Check poison, set bad_flags > > >2. poison clears in parallel > > >3. Check page->flag state in bad_page and trigger warning > > > > > >The code changes it to > > > > > >1. Check poison, set bad_flags > > >2. poison clears in parallel > > >3. Check bad_flags and trigger warning > > > > I think you got step 3 here wrong. It's "skip the warning since we have set > > bad_flags to hwpoison and bad_flags didn't change due to parallel unpoison". > > > > I think the benefit is marginal. The race means that the patch will trigger > a warning that might have been missed before due to a parallel unpoison > but that's not necessary a Good Thing. It's inherently race-prone. > > Naoya, if you fix the check to (bad_flags & __PG_POISON) then I'll add my > ack but I'm not convinced it's a real problem. This v1 had the wrong operator issue as you mentioned. I posted v2 a while ago, which has no such issue and is a better fix hopefully. Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case 2016-05-18 9:31 ` Vlastimil Babka 2016-05-18 9:52 ` Mel Gorman @ 2016-05-18 10:09 ` Naoya Horiguchi 2016-05-18 14:03 ` Mel Gorman 1 sibling, 1 reply; 9+ messages in thread From: Naoya Horiguchi @ 2016-05-18 10:09 UTC (permalink / raw) To: Vlastimil Babka Cc: Mel Gorman, Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On Wed, May 18, 2016 at 11:31:07AM +0200, Vlastimil Babka wrote: > On 05/18/2016 11:21 AM, Mel Gorman wrote: > > On Tue, May 17, 2016 at 04:42:55PM +0900, Naoya Horiguchi wrote: > > > There's a race window between checking page->flags and unpoisoning, which > > > taints kernel with "BUG: Bad page state". That's overkill. It's safer to > > > use bad_flags to detect hwpoisoned page. > > > > > > > I'm not quite getting this one. Minimally, instead of = __PG_HWPOISON, it > > should have been (bad_flags & __PG_POISON). As Vlastimil already pointed > > out, __PG_HWPOISON can be 0. What I'm not getting is why this fixes the > > race. The current race is > > > > 1. Check poison, set bad_flags > > 2. poison clears in parallel > > 3. Check page->flag state in bad_page and trigger warning > > > > The code changes it to > > > > 1. Check poison, set bad_flags > > 2. poison clears in parallel > > 3. Check bad_flags and trigger warning > > I think you got step 3 here wrong. It's "skip the warning since we have set > bad_flags to hwpoison and bad_flags didn't change due to parallel unpoison". > > Perhaps the question is why do we need to split the handling between > check_new_page_bad() and bad_page() like this? It might have been different > in the past, but seems like at this point we only look for hwpoison from > check_new_page_bad(). But a cleanup can come later. Thanks for clarification. check_new_page_bad() is the only function interested in hwpoison flag, so we had better move the hwpoison related code in bad_page() to check_new_page_bad(). Thanks, Naoya Horiguchi --- >From c600b1ee6c36b3df6973f5365b4179c92f3c08e3 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Date: Wed, 18 May 2016 18:42:57 +0900 Subject: [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case Currently we check page->flags twice for "HWPoisoned" case of check_new_page_bad(), which can cause a race with unpoisoning. This race unnecessarily taints kernel with "BUG: Bad page state". check_new_page_bad() is the only caller of bad_page() which is interested in __PG_HWPOISON, so let's move the hwpoison related code in bad_page() to it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> --- mm/page_alloc.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b269bc3eca7..59b938ddfb2d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -522,12 +522,6 @@ static void bad_page(struct page *page, const char *reason, static unsigned long nr_shown; static unsigned long nr_unshown; - /* Don't complain about poisoned pages */ - if (PageHWPoison(page)) { - page_mapcount_reset(page); /* remove PageBuddy */ - return; - } - /* * Allow a burst of 60 reports, then keep quiet for that minute; * or allow a steady drip of one report per second. @@ -1654,6 +1648,9 @@ static void check_new_page_bad(struct page *page) if (unlikely(page->flags & __PG_HWPOISON)) { bad_reason = "HWPoisoned (hardware-corrupted)"; bad_flags = __PG_HWPOISON; + /* Don't complain about hwpoisoned pages */ + page_mapcount_reset(page); /* remove PageBuddy */ + return; } if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set"; -- 2.5.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case 2016-05-18 10:09 ` [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case Naoya Horiguchi @ 2016-05-18 14:03 ` Mel Gorman 2016-05-20 14:27 ` Vlastimil Babka 0 siblings, 1 reply; 9+ messages in thread From: Mel Gorman @ 2016-05-18 14:03 UTC (permalink / raw) To: Naoya Horiguchi Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On Wed, May 18, 2016 at 10:09:50AM +0000, Naoya Horiguchi wrote: > From c600b1ee6c36b3df6973f5365b4179c92f3c08e3 Mon Sep 17 00:00:00 2001 > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Date: Wed, 18 May 2016 18:42:57 +0900 > Subject: [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON > case > > Currently we check page->flags twice for "HWPoisoned" case of > check_new_page_bad(), which can cause a race with unpoisoning. > This race unnecessarily taints kernel with "BUG: Bad page state". > check_new_page_bad() is the only caller of bad_page() which is interested > in __PG_HWPOISON, so let's move the hwpoison related code in bad_page() > to it. > > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case 2016-05-18 14:03 ` Mel Gorman @ 2016-05-20 14:27 ` Vlastimil Babka 0 siblings, 0 replies; 9+ messages in thread From: Vlastimil Babka @ 2016-05-20 14:27 UTC (permalink / raw) To: Mel Gorman, Naoya Horiguchi Cc: Andrew Morton, linux-mm, linux-kernel, Naoya Horiguchi On 05/18/2016 04:03 PM, Mel Gorman wrote: > On Wed, May 18, 2016 at 10:09:50AM +0000, Naoya Horiguchi wrote: >> From c600b1ee6c36b3df6973f5365b4179c92f3c08e3 Mon Sep 17 00:00:00 2001 >> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> >> Date: Wed, 18 May 2016 18:42:57 +0900 >> Subject: [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON >> case >> >> Currently we check page->flags twice for "HWPoisoned" case of >> check_new_page_bad(), which can cause a race with unpoisoning. >> This race unnecessarily taints kernel with "BUG: Bad page state". >> check_new_page_bad() is the only caller of bad_page() which is interested >> in __PG_HWPOISON, so let's move the hwpoison related code in bad_page() >> to it. >> >> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > > Acked-by: Mel Gorman <mgorman@techsingularity.net> > Acked-by: Vlastimil Babka <vbabka@suse.cz> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-05-20 14:27 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-17 7:42 [PATCH v1] mm: bad_page() checks bad_flags instead of page->flags for hwpoison page Naoya Horiguchi 2016-05-18 7:30 ` Vlastimil Babka 2016-05-18 9:21 ` Mel Gorman 2016-05-18 9:31 ` Vlastimil Babka 2016-05-18 9:52 ` Mel Gorman 2016-05-18 10:17 ` Naoya Horiguchi 2016-05-18 10:09 ` [PATCH v2] mm: check_new_page_bad() directly returns in __PG_HWPOISON case Naoya Horiguchi 2016-05-18 14:03 ` Mel Gorman 2016-05-20 14:27 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).