linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
@ 2022-03-09  9:14 Naoya Horiguchi
  2022-03-09 21:30 ` Andrew Morton
  2022-03-09 21:55 ` Yang Shi
  0 siblings, 2 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2022-03-09  9:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mike Kravetz, Miaohe Lin, Yang Shi,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

There is a race condition between memory_failure_hugetlb() and hugetlb
free/demotion, which causes setting PageHWPoison flag on the wrong page
(which was a hugetlb when memory_failrue() was called, but was removed
or demoted when memory_failure_hugetlb() is called).  This results in
killing wrong processes.  So set PageHWPoison flag with holding page lock,

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ac6492e36978..fe25eee8f9d6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	int res;
 	unsigned long page_flags;
 
-	if (TestSetPageHWPoison(head)) {
-		pr_err("Memory failure: %#lx: already hardware poisoned\n",
-		       pfn);
-		res = -EHWPOISON;
-		if (flags & MF_ACTION_REQUIRED)
-			res = kill_accessing_process(current, page_to_pfn(head), flags);
-		return res;
-	}
-
-	num_poisoned_pages_inc();
-
 	if (!(flags & MF_COUNT_INCREASED)) {
 		res = get_hwpoison_page(p, flags);
 		if (!res) {
 			lock_page(head);
 			if (hwpoison_filter(p)) {
-				if (TestClearPageHWPoison(head))
-					num_poisoned_pages_dec();
 				unlock_page(head);
 				return -EOPNOTSUPP;
 			}
@@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	page_flags = head->flags;
 
 	if (hwpoison_filter(p)) {
-		if (TestClearPageHWPoison(head))
-			num_poisoned_pages_dec();
 		put_page(p);
 		res = -EOPNOTSUPP;
 		goto out;
 	}
 
+	if (TestSetPageHWPoison(head))
+		goto already_hwpoisoned;
+
+	num_poisoned_pages_inc();
+
 	/*
 	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
 	 * simply disable it. In order to make it work properly, we need
@@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 out:
 	unlock_page(head);
 	return res;
+already_hwpoisoned:
+	unlock_page(head);
+	pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
+	res = -EHWPOISON;
+	if (flags & MF_ACTION_REQUIRED)
+		res = kill_accessing_process(current, page_to_pfn(head), flags);
+	return res;
 }
 
 static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09  9:14 [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
@ 2022-03-09 21:30 ` Andrew Morton
  2022-03-10  1:15   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-09 21:55 ` Yang Shi
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2022-03-09 21:30 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-mm, Mike Kravetz, Miaohe Lin, Yang Shi, Naoya Horiguchi,
	linux-kernel

On Wed,  9 Mar 2022 18:14:49 +0900 Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote:

> There is a race condition between memory_failure_hugetlb() and hugetlb
> free/demotion, which causes setting PageHWPoison flag on the wrong page
> (which was a hugetlb when memory_failrue() was called, but was removed
> or demoted when memory_failure_hugetlb() is called).  This results in
> killing wrong processes.  So set PageHWPoison flag with holding page lock,

What are the runtime effects of this?  Do we think a -stable backport
is needed?

Are we missing a reported-by here?  I'm too lazy to hunt down who it was.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09  9:14 [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
  2022-03-09 21:30 ` Andrew Morton
@ 2022-03-09 21:55 ` Yang Shi
  2022-03-09 23:59   ` Mike Kravetz
  2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 2 replies; 10+ messages in thread
From: Yang Shi @ 2022-03-09 21:55 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, Mike Kravetz, Miaohe Lin,
	Naoya Horiguchi, Linux Kernel Mailing List

On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> There is a race condition between memory_failure_hugetlb() and hugetlb
> free/demotion, which causes setting PageHWPoison flag on the wrong page
> (which was a hugetlb when memory_failrue() was called, but was removed
> or demoted when memory_failure_hugetlb() is called).  This results in
> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
>  mm/memory-failure.c | 27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ac6492e36978..fe25eee8f9d6 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>         int res;
>         unsigned long page_flags;
>
> -       if (TestSetPageHWPoison(head)) {
> -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> -                      pfn);
> -               res = -EHWPOISON;
> -               if (flags & MF_ACTION_REQUIRED)
> -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> -               return res;
> -       }
> -
> -       num_poisoned_pages_inc();
> -
>         if (!(flags & MF_COUNT_INCREASED)) {
>                 res = get_hwpoison_page(p, flags);

I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
could solve the race? Is the below race still possible?

__get_hwpoison_page()
  head = compound_head(page)

hugetlb demotion (1G --> 2M)
  get_hwpoison_huge_page(head, &hugetlb);


Then the head may point to a 2M page, but the hwpoisoned subpage is
not in that 2M range?


>                 if (!res) {
>                         lock_page(head);
>                         if (hwpoison_filter(p)) {
> -                               if (TestClearPageHWPoison(head))
> -                                       num_poisoned_pages_dec();
>                                 unlock_page(head);
>                                 return -EOPNOTSUPP;
>                         }
> @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>         page_flags = head->flags;
>
>         if (hwpoison_filter(p)) {
> -               if (TestClearPageHWPoison(head))
> -                       num_poisoned_pages_dec();
>                 put_page(p);
>                 res = -EOPNOTSUPP;
>                 goto out;
>         }
>
> +       if (TestSetPageHWPoison(head))

And I don't think "head" is still the head you expected if the race
happened. I think we need to re-retrieve the head once the page
refcount is bumped and locked.

> +               goto already_hwpoisoned;
> +
> +       num_poisoned_pages_inc();
> +
>         /*
>          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>          * simply disable it. In order to make it work properly, we need
> @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  out:
>         unlock_page(head);
>         return res;
> +already_hwpoisoned:
> +       unlock_page(head);
> +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> +       res = -EHWPOISON;
> +       if (flags & MF_ACTION_REQUIRED)
> +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> +       return res;
>  }
>
>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09 21:55 ` Yang Shi
@ 2022-03-09 23:59   ` Mike Kravetz
  2022-03-10  0:29     ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2022-03-09 23:59 UTC (permalink / raw)
  To: Yang Shi, Naoya Horiguchi
  Cc: Linux MM, Andrew Morton, Miaohe Lin, Naoya Horiguchi,
	Linux Kernel Mailing List

On 3/9/22 13:55, Yang Shi wrote:
> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
>>
>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> There is a race condition between memory_failure_hugetlb() and hugetlb
>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>> (which was a hugetlb when memory_failrue() was called, but was removed
>> or demoted when memory_failure_hugetlb() is called).  This results in
>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>>
>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> ---
>>  mm/memory-failure.c | 27 ++++++++++++---------------
>>  1 file changed, 12 insertions(+), 15 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ac6492e36978..fe25eee8f9d6 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>         int res;
>>         unsigned long page_flags;
>>
>> -       if (TestSetPageHWPoison(head)) {
>> -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
>> -                      pfn);
>> -               res = -EHWPOISON;
>> -               if (flags & MF_ACTION_REQUIRED)
>> -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
>> -               return res;
>> -       }
>> -
>> -       num_poisoned_pages_inc();
>> -
>>         if (!(flags & MF_COUNT_INCREASED)) {
>>                 res = get_hwpoison_page(p, flags);
> 
> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> could solve the race? Is the below race still possible?
> 
> __get_hwpoison_page()
>   head = compound_head(page)
> 
> hugetlb demotion (1G --> 2M)
>   get_hwpoison_huge_page(head, &hugetlb);
> 
> 
> Then the head may point to a 2M page, but the hwpoisoned subpage is
> not in that 2M range?

That is correct.

It is also possible that __free_pages(page, huge_page_order(h)) could have
been called during this window.  So IIUC, head would have an increased ref
count and pages would be freed to buddy when the memory error code drops the
ref.  At that time, head would be marked as poisoned which could be different
than actual page with poison.

An increased ref count, or page lock will not prevent hugetlb page demotion
or (attempting) to free to buddy today.

There is already a patch in Andrew's tree to not demote hugetlb pages marked
with poison.  This at least makes the demote code perform the same check as
allocation code.  The race which started this discussion has been there for
a while.  demotion opened another window, but that is now closed.

IMO, it would be better to take a step back and look at the overall design
and decide how to proceed.  There is also an effort underway to provide double
mapping of hugetlb pages, and one of the target use cases is memory error
handling.  This effort is in the very early stages, but it will certainly
require setting poison on the (sub-)page with actual error rather than
head page.  Perhaps something like what is done for THP today.  Nothing to
address yet, but I just wanted to note there will be future changes in this
area.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09 21:55 ` Yang Shi
  2022-03-09 23:59   ` Mike Kravetz
@ 2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-10  0:30     ` Yang Shi
  1 sibling, 1 reply; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-10  0:00 UTC (permalink / raw)
  To: Yang Shi
  Cc: Naoya Horiguchi, Linux MM, Andrew Morton, Mike Kravetz,
	Miaohe Lin, Linux Kernel Mailing List

On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
> >
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >
> > There is a race condition between memory_failure_hugetlb() and hugetlb
> > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > (which was a hugetlb when memory_failrue() was called, but was removed
> > or demoted when memory_failure_hugetlb() is called).  This results in
> > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> >
> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > ---
> >  mm/memory-failure.c | 27 ++++++++++++---------------
> >  1 file changed, 12 insertions(+), 15 deletions(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index ac6492e36978..fe25eee8f9d6 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >         int res;
> >         unsigned long page_flags;
> >
> > -       if (TestSetPageHWPoison(head)) {
> > -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -                      pfn);
> > -               res = -EHWPOISON;
> > -               if (flags & MF_ACTION_REQUIRED)
> > -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> > -               return res;
> > -       }
> > -
> > -       num_poisoned_pages_inc();
> > -
> >         if (!(flags & MF_COUNT_INCREASED)) {
> >                 res = get_hwpoison_page(p, flags);
> 
> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> could solve the race? Is the below race still possible?
> 
> __get_hwpoison_page()
>   head = compound_head(page)
> 
> hugetlb demotion (1G --> 2M)
>   get_hwpoison_huge_page(head, &hugetlb);

Thanks for the comment.
I assume Miaohe's patch below introduces additional check to detect the
race.  The patch calls compound_head() for the raw error page again, so
the demotion case should be detected.  I'll make the dependency clear in
the commit log.

https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/

> 
> 
> Then the head may point to a 2M page, but the hwpoisoned subpage is
> not in that 2M range?
> 
> 
> >                 if (!res) {
> >                         lock_page(head);
> >                         if (hwpoison_filter(p)) {
> > -                               if (TestClearPageHWPoison(head))
> > -                                       num_poisoned_pages_dec();
> >                                 unlock_page(head);
> >                                 return -EOPNOTSUPP;
> >                         }
> > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >         page_flags = head->flags;
> >
> >         if (hwpoison_filter(p)) {
> > -               if (TestClearPageHWPoison(head))
> > -                       num_poisoned_pages_dec();
> >                 put_page(p);
> >                 res = -EOPNOTSUPP;
> >                 goto out;
> >         }
> >
> > +       if (TestSetPageHWPoison(head))
> 
> And I don't think "head" is still the head you expected if the race
> happened. I think we need to re-retrieve the head once the page
> refcount is bumped and locked.

I think the above justification works for this.
When the kernel reaches this line, the hugepage is properly pinned without being
freed or demoted, so "head" is still pointing to the same head page as expected.

Thanks,
Naoya Horiguchi

> 
> > +               goto already_hwpoisoned;
> > +
> > +       num_poisoned_pages_inc();
> > +
> >         /*
> >          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> >          * simply disable it. In order to make it work properly, we need
> > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >  out:
> >         unlock_page(head);
> >         return res;
> > +already_hwpoisoned:
> > +       unlock_page(head);
> > +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > +       res = -EHWPOISON;
> > +       if (flags & MF_ACTION_REQUIRED)
> > +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> > +       return res;
> >  }
> >
> >  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09 23:59   ` Mike Kravetz
@ 2022-03-10  0:29     ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-10  0:29 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Yang Shi, Naoya Horiguchi, Linux MM, Andrew Morton, Miaohe Lin,
	Linux Kernel Mailing List

On Wed, Mar 09, 2022 at 03:59:55PM -0800, Mike Kravetz wrote:
> On 3/9/22 13:55, Yang Shi wrote:
> > On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> > <naoya.horiguchi@linux.dev> wrote:
> >>
> >> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>
> >> There is a race condition between memory_failure_hugetlb() and hugetlb
> >> free/demotion, which causes setting PageHWPoison flag on the wrong page
> >> (which was a hugetlb when memory_failrue() was called, but was removed
> >> or demoted when memory_failure_hugetlb() is called).  This results in
> >> killing wrong processes.  So set PageHWPoison flag with holding page lock,
> >>
> >> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >> ---
> >>  mm/memory-failure.c | 27 ++++++++++++---------------
> >>  1 file changed, 12 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index ac6492e36978..fe25eee8f9d6 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>         int res;
> >>         unsigned long page_flags;
> >>
> >> -       if (TestSetPageHWPoison(head)) {
> >> -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> >> -                      pfn);
> >> -               res = -EHWPOISON;
> >> -               if (flags & MF_ACTION_REQUIRED)
> >> -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> >> -               return res;
> >> -       }
> >> -
> >> -       num_poisoned_pages_inc();
> >> -
> >>         if (!(flags & MF_COUNT_INCREASED)) {
> >>                 res = get_hwpoison_page(p, flags);
> > 
> > I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> > could solve the race? Is the below race still possible?
> > 
> > __get_hwpoison_page()
> >   head = compound_head(page)
> > 
> > hugetlb demotion (1G --> 2M)
> >   get_hwpoison_huge_page(head, &hugetlb);
> > 
> > 
> > Then the head may point to a 2M page, but the hwpoisoned subpage is
> > not in that 2M range?
> 
> That is correct.
> 
> It is also possible that __free_pages(page, huge_page_order(h)) could have
> been called during this window.  So IIUC, head would have an increased ref
> count and pages would be freed to buddy when the memory error code drops the
> ref.  At that time, head would be marked as poisoned which could be different
> than actual page with poison.
> 
> An increased ref count, or page lock will not prevent hugetlb page demotion
> or (attempting) to free to buddy today.

Sorry, I misread the above race in my previous email. I rethink better
solution to cover this.

> 
> There is already a patch in Andrew's tree to not demote hugetlb pages marked
> with poison.  This at least makes the demote code perform the same check as
> allocation code.  The race which started this discussion has been there for
> a while.  demotion opened another window, but that is now closed.
> 
> IMO, it would be better to take a step back and look at the overall design
> and decide how to proceed.  There is also an effort underway to provide double
> mapping of hugetlb pages, and one of the target use cases is memory error
> handling.  This effort is in the very early stages, but it will certainly
> require setting poison on the (sub-)page with actual error rather than
> head page.

Someone mentioned the similar point when discussing "freeing vmemmap pages
for hugetlb"  patchset, and there was an idea that actual error page is
stored in private field in the first tail page instead of using PG_hwpoison
on raw subpages.  That sounds good to me.

> Perhaps something like what is done for THP today.  Nothing to
> address yet, but I just wanted to note there will be future changes in this
> area.

Thanks for the comment.

- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-10  0:30     ` Yang Shi
  2022-03-10  6:23       ` Miaohe Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Yang Shi @ 2022-03-10  0:30 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, Linux MM, Andrew Morton, Mike Kravetz,
	Miaohe Lin, Linux Kernel Mailing List

On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@nec.com> wrote:
>
> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
> > On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> > <naoya.horiguchi@linux.dev> wrote:
> > >
> > > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > >
> > > There is a race condition between memory_failure_hugetlb() and hugetlb
> > > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > > (which was a hugetlb when memory_failrue() was called, but was removed
> > > or demoted when memory_failure_hugetlb() is called).  This results in
> > > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> > >
> > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > > ---
> > >  mm/memory-failure.c | 27 ++++++++++++---------------
> > >  1 file changed, 12 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > > index ac6492e36978..fe25eee8f9d6 100644
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >         int res;
> > >         unsigned long page_flags;
> > >
> > > -       if (TestSetPageHWPoison(head)) {
> > > -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > > -                      pfn);
> > > -               res = -EHWPOISON;
> > > -               if (flags & MF_ACTION_REQUIRED)
> > > -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> > > -               return res;
> > > -       }
> > > -
> > > -       num_poisoned_pages_inc();
> > > -
> > >         if (!(flags & MF_COUNT_INCREASED)) {
> > >                 res = get_hwpoison_page(p, flags);
> >
> > I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> > could solve the race? Is the below race still possible?
> >
> > __get_hwpoison_page()
> >   head = compound_head(page)
> >
> > hugetlb demotion (1G --> 2M)
> >   get_hwpoison_huge_page(head, &hugetlb);
>
> Thanks for the comment.
> I assume Miaohe's patch below introduces additional check to detect the
> race.  The patch calls compound_head() for the raw error page again, so
> the demotion case should be detected.  I'll make the dependency clear in
> the commit log.
>
> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/
>
> >
> >
> > Then the head may point to a 2M page, but the hwpoisoned subpage is
> > not in that 2M range?
> >
> >
> > >                 if (!res) {
> > >                         lock_page(head);
> > >                         if (hwpoison_filter(p)) {
> > > -                               if (TestClearPageHWPoison(head))
> > > -                                       num_poisoned_pages_dec();
> > >                                 unlock_page(head);
> > >                                 return -EOPNOTSUPP;
> > >                         }
> > > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >         page_flags = head->flags;
> > >
> > >         if (hwpoison_filter(p)) {
> > > -               if (TestClearPageHWPoison(head))
> > > -                       num_poisoned_pages_dec();
> > >                 put_page(p);
> > >                 res = -EOPNOTSUPP;
> > >                 goto out;
> > >         }
> > >
> > > +       if (TestSetPageHWPoison(head))
> >
> > And I don't think "head" is still the head you expected if the race
> > happened. I think we need to re-retrieve the head once the page
> > refcount is bumped and locked.
>
> I think the above justification works for this.
> When the kernel reaches this line, the hugepage is properly pinned without being
> freed or demoted, so "head" is still pointing to the same head page as expected.

I think Mike's comment in the earlier email works for this too. The
huge page may get demoted before the page is pinned and locked, so the
actual hwpoisoned subpage may belong to another smaller huge page now.


>
> Thanks,
> Naoya Horiguchi
>
> >
> > > +               goto already_hwpoisoned;
> > > +
> > > +       num_poisoned_pages_inc();
> > > +
> > >         /*
> > >          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> > >          * simply disable it. In order to make it work properly, we need
> > > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> > >  out:
> > >         unlock_page(head);
> > >         return res;
> > > +already_hwpoisoned:
> > > +       unlock_page(head);
> > > +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > > +       res = -EHWPOISON;
> > > +       if (flags & MF_ACTION_REQUIRED)
> > > +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> > > +       return res;
> > >  }
> > >
> > >  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-09 21:30 ` Andrew Morton
@ 2022-03-10  1:15   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 10+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-10  1:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Naoya Horiguchi, linux-mm, Mike Kravetz, Miaohe Lin, Yang Shi,
	linux-kernel

On Wed, Mar 09, 2022 at 01:30:10PM -0800, Andrew Morton wrote:
> On Wed,  9 Mar 2022 18:14:49 +0900 Naoya Horiguchi <naoya.horiguchi@linux.dev> wrote:
> 
> > There is a race condition between memory_failure_hugetlb() and hugetlb
> > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > (which was a hugetlb when memory_failrue() was called, but was removed
> > or demoted when memory_failure_hugetlb() is called).  This results in
> > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> 
> What are the runtime effects of this?  Do we think a -stable backport
> is needed?

The actual user-visible effect might be obscure because even if
memory_failure() works as expected, some random process could be killed.
The actual error is left unhandled, so no one prevents later access to it,
which might lead to more serious results like consuming corrupted data.
So I think that this is worth sending -stable backport.

But unfortunately this patch still needs update, could you drop this from
mmotm for a while?

> 
> Are we missing a reported-by here?  I'm too lazy to hunt down who it was.

I noticed this by Mike's comment, so I'll add his reported-by.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-10  0:30     ` Yang Shi
@ 2022-03-10  6:23       ` Miaohe Lin
  2022-03-10 17:50         ` Yang Shi
  0 siblings, 1 reply; 10+ messages in thread
From: Miaohe Lin @ 2022-03-10  6:23 UTC (permalink / raw)
  To: Yang Shi, HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, Linux MM, Andrew Morton, Mike Kravetz,
	Linux Kernel Mailing List

On 2022/3/10 8:30, Yang Shi wrote:
> On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
>>
>> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
>>> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
>>> <naoya.horiguchi@linux.dev> wrote:
>>>>
>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>
>>>> There is a race condition between memory_failure_hugetlb() and hugetlb
>>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>>>> (which was a hugetlb when memory_failrue() was called, but was removed
>>>> or demoted when memory_failure_hugetlb() is called).  This results in
>>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>>>>
>>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>> ---
>>>>  mm/memory-failure.c | 27 ++++++++++++---------------
>>>>  1 file changed, 12 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index ac6492e36978..fe25eee8f9d6 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>         int res;
>>>>         unsigned long page_flags;
>>>>
>>>> -       if (TestSetPageHWPoison(head)) {
>>>> -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
>>>> -                      pfn);
>>>> -               res = -EHWPOISON;
>>>> -               if (flags & MF_ACTION_REQUIRED)
>>>> -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
>>>> -               return res;
>>>> -       }
>>>> -
>>>> -       num_poisoned_pages_inc();
>>>> -
>>>>         if (!(flags & MF_COUNT_INCREASED)) {
>>>>                 res = get_hwpoison_page(p, flags);
>>>
>>> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
>>> could solve the race? Is the below race still possible?
>>>
>>> __get_hwpoison_page()
>>>   head = compound_head(page)
>>>
>>> hugetlb demotion (1G --> 2M)
>>>   get_hwpoison_huge_page(head, &hugetlb);
>>
>> Thanks for the comment.
>> I assume Miaohe's patch below introduces additional check to detect the
>> race.  The patch calls compound_head() for the raw error page again, so
>> the demotion case should be detected.  I'll make the dependency clear in
>> the commit log.
>>
>> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/
>>
>>>
>>>
>>> Then the head may point to a 2M page, but the hwpoisoned subpage is
>>> not in that 2M range?
>>>
>>>
>>>>                 if (!res) {
>>>>                         lock_page(head);
>>>>                         if (hwpoison_filter(p)) {
>>>> -                               if (TestClearPageHWPoison(head))
>>>> -                                       num_poisoned_pages_dec();
>>>>                                 unlock_page(head);
>>>>                                 return -EOPNOTSUPP;
>>>>                         }
>>>> @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>         page_flags = head->flags;
>>>>
>>>>         if (hwpoison_filter(p)) {
>>>> -               if (TestClearPageHWPoison(head))
>>>> -                       num_poisoned_pages_dec();
>>>>                 put_page(p);
>>>>                 res = -EOPNOTSUPP;
>>>>                 goto out;
>>>>         }
>>>>
>>>> +       if (TestSetPageHWPoison(head))
>>>
>>> And I don't think "head" is still the head you expected if the race
>>> happened. I think we need to re-retrieve the head once the page
>>> refcount is bumped and locked.
>>
>> I think the above justification works for this.
>> When the kernel reaches this line, the hugepage is properly pinned without being
>> freed or demoted, so "head" is still pointing to the same head page as expected.
> 
> I think Mike's comment in the earlier email works for this too. The
> huge page may get demoted before the page is pinned and locked, so the
> actual hwpoisoned subpage may belong to another smaller huge page now.
> 

I thinks Naoya assumes that there is a check before we use "head":

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5444a8ef4867..0d7c58340a98 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1534,6 +1534,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	}

 	lock_page(head);
+
+	/**
+	 * The page could have changed compound pages due to race window.
+	 * If this happens just bail out.
+	 */
+	if (!PageHuge(p) || compound_head(p) != head) {
+		action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
 	page_flags = head->flags;

 	if (hwpoison_filter(p)) {
-- 
from: https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/

Thanks.

> 
>>
>> Thanks,
>> Naoya Horiguchi
>>
>>>
>>>> +               goto already_hwpoisoned;
>>>> +
>>>> +       num_poisoned_pages_inc();
>>>> +
>>>>         /*
>>>>          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>>>>          * simply disable it. In order to make it work properly, we need
>>>> @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>  out:
>>>>         unlock_page(head);
>>>>         return res;
>>>> +already_hwpoisoned:
>>>> +       unlock_page(head);
>>>> +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
>>>> +       res = -EHWPOISON;
>>>> +       if (flags & MF_ACTION_REQUIRED)
>>>> +               res = kill_accessing_process(current, page_to_pfn(head), flags);
>>>> +       return res;
>>>>  }
>>>>
>>>>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>> --
>>>> 2.25.1
>>>>
> .
> 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-10  6:23       ` Miaohe Lin
@ 2022-03-10 17:50         ` Yang Shi
  0 siblings, 0 replies; 10+ messages in thread
From: Yang Shi @ 2022-03-10 17:50 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: HORIGUCHI NAOYA(堀口 直也),
	Naoya Horiguchi, Linux MM, Andrew Morton, Mike Kravetz,
	Linux Kernel Mailing List

On Wed, Mar 9, 2022 at 10:24 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/10 8:30, Yang Shi wrote:
> > On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(堀口 直也)
> > <naoya.horiguchi@nec.com> wrote:
> >>
> >> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote:
> >>> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi
> >>> <naoya.horiguchi@linux.dev> wrote:
> >>>>
> >>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>>
> >>>> There is a race condition between memory_failure_hugetlb() and hugetlb
> >>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
> >>>> (which was a hugetlb when memory_failrue() was called, but was removed
> >>>> or demoted when memory_failure_hugetlb() is called).  This results in
> >>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
> >>>>
> >>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>> ---
> >>>>  mm/memory-failure.c | 27 ++++++++++++---------------
> >>>>  1 file changed, 12 insertions(+), 15 deletions(-)
> >>>>
> >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>> index ac6492e36978..fe25eee8f9d6 100644
> >>>> --- a/mm/memory-failure.c
> >>>> +++ b/mm/memory-failure.c
> >>>> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>>>         int res;
> >>>>         unsigned long page_flags;
> >>>>
> >>>> -       if (TestSetPageHWPoison(head)) {
> >>>> -               pr_err("Memory failure: %#lx: already hardware poisoned\n",
> >>>> -                      pfn);
> >>>> -               res = -EHWPOISON;
> >>>> -               if (flags & MF_ACTION_REQUIRED)
> >>>> -                       res = kill_accessing_process(current, page_to_pfn(head), flags);
> >>>> -               return res;
> >>>> -       }
> >>>> -
> >>>> -       num_poisoned_pages_inc();
> >>>> -
> >>>>         if (!(flags & MF_COUNT_INCREASED)) {
> >>>>                 res = get_hwpoison_page(p, flags);
> >>>
> >>> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this
> >>> could solve the race? Is the below race still possible?
> >>>
> >>> __get_hwpoison_page()
> >>>   head = compound_head(page)
> >>>
> >>> hugetlb demotion (1G --> 2M)
> >>>   get_hwpoison_huge_page(head, &hugetlb);
> >>
> >> Thanks for the comment.
> >> I assume Miaohe's patch below introduces additional check to detect the
> >> race.  The patch calls compound_head() for the raw error page again, so
> >> the demotion case should be detected.  I'll make the dependency clear in
> >> the commit log.
> >>
> >> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/
> >>
> >>>
> >>>
> >>> Then the head may point to a 2M page, but the hwpoisoned subpage is
> >>> not in that 2M range?
> >>>
> >>>
> >>>>                 if (!res) {
> >>>>                         lock_page(head);
> >>>>                         if (hwpoison_filter(p)) {
> >>>> -                               if (TestClearPageHWPoison(head))
> >>>> -                                       num_poisoned_pages_dec();
> >>>>                                 unlock_page(head);
> >>>>                                 return -EOPNOTSUPP;
> >>>>                         }
> >>>> @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>>>         page_flags = head->flags;
> >>>>
> >>>>         if (hwpoison_filter(p)) {
> >>>> -               if (TestClearPageHWPoison(head))
> >>>> -                       num_poisoned_pages_dec();
> >>>>                 put_page(p);
> >>>>                 res = -EOPNOTSUPP;
> >>>>                 goto out;
> >>>>         }
> >>>>
> >>>> +       if (TestSetPageHWPoison(head))
> >>>
> >>> And I don't think "head" is still the head you expected if the race
> >>> happened. I think we need to re-retrieve the head once the page
> >>> refcount is bumped and locked.
> >>
> >> I think the above justification works for this.
> >> When the kernel reaches this line, the hugepage is properly pinned without being
> >> freed or demoted, so "head" is still pointing to the same head page as expected.
> >
> > I think Mike's comment in the earlier email works for this too. The
> > huge page may get demoted before the page is pinned and locked, so the
> > actual hwpoisoned subpage may belong to another smaller huge page now.
> >
>
> I thinks Naoya assumes that there is a check before we use "head":
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..0d7c58340a98 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1534,6 +1534,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>         }
>
>         lock_page(head);
> +
> +       /**
> +        * The page could have changed compound pages due to race window.
> +        * If this happens just bail out.
> +        */
> +       if (!PageHuge(p) || compound_head(p) != head) {
> +               action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED);
> +               res = -EBUSY;
> +               goto out;
> +       }
> +
>         page_flags = head->flags;
>
>         if (hwpoison_filter(p)) {
> --
> from: https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@huawei.com/

Aha, thanks, I missed that. Yeah, we definitely need to revalidate the page.

>
> Thanks.
>
> >
> >>
> >> Thanks,
> >> Naoya Horiguchi
> >>
> >>>
> >>>> +               goto already_hwpoisoned;
> >>>> +
> >>>> +       num_poisoned_pages_inc();
> >>>> +
> >>>>         /*
> >>>>          * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> >>>>          * simply disable it. In order to make it work properly, we need
> >>>> @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>>>  out:
> >>>>         unlock_page(head);
> >>>>         return res;
> >>>> +already_hwpoisoned:
> >>>> +       unlock_page(head);
> >>>> +       pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> >>>> +       res = -EHWPOISON;
> >>>> +       if (flags & MF_ACTION_REQUIRED)
> >>>> +               res = kill_accessing_process(current, page_to_pfn(head), flags);
> >>>> +       return res;
> >>>>  }
> >>>>
> >>>>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> >>>> --
> >>>> 2.25.1
> >>>>
> > .
> >
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-10 17:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09  9:14 [PATCH v1] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
2022-03-09 21:30 ` Andrew Morton
2022-03-10  1:15   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-09 21:55 ` Yang Shi
2022-03-09 23:59   ` Mike Kravetz
2022-03-10  0:29     ` HORIGUCHI NAOYA(堀口 直也)
2022-03-10  0:00   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-10  0:30     ` Yang Shi
2022-03-10  6:23       ` Miaohe Lin
2022-03-10 17:50         ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).