linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
@ 2022-03-14  2:13 Naoya Horiguchi
  2022-03-14  7:10 ` Miaohe Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Naoya Horiguchi @ 2022-03-14  2:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mike Kravetz, Miaohe Lin, Yang Shi,
	Naoya Horiguchi, linux-kernel

From: Naoya Horiguchi <naoya.horiguchi@nec.com>

There is a race condition between memory_failure_hugetlb() and hugetlb
free/demotion, which causes setting PageHWPoison flag on the wrong page
(which was a hugetlb when memory_failure() was called, but was removed
or demoted when memory_failure_hugetlb() is called).  This results in
killing wrong processes.  So set PageHWPoison flag with holding page lock,

The actual user-visible effect might be obscure because even if
memory_failure() works as expected, some random process could be killed.
Even worse, the actual error is left unhandled, so no one prevents later
access to it, which might lead to more serious results like consuming
corrupted data.

This patch depends on Miaohe's patch titled "mm/memory-failure.c: fix
race with changing page compound again".

Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <stable@vger.kernel.org>
---
ChangeLog v1 -> v2:
- pass subpage to get_hwpoison_huge_page() instead of head page.
- call compound_head() in hugetlb_lock to avoid race with hugetlb
  demotion/free.
---
 mm/hugetlb.c        |  8 +++++---
 mm/memory-failure.c | 33 +++++++++++++++------------------
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f294db835f4b..345fed90842e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6761,14 +6761,16 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
 
 int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
 {
+	struct page *head;
 	int ret = 0;
 
 	*hugetlb = false;
 	spin_lock_irq(&hugetlb_lock);
-	if (PageHeadHuge(page)) {
+	head = compound_head(page);
+	if (PageHeadHuge(head)) {
 		*hugetlb = true;
-		if (HPageFreed(page) || HPageMigratable(page))
-			ret = get_page_unless_zero(page);
+		if (HPageFreed(head) || HPageMigratable(head))
+			ret = get_page_unless_zero(head);
 		else
 			ret = -EBUSY;
 	}
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a9bfd04d2a3c..c40c00c3a261 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1193,7 +1193,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(page, &hugetlb);
 	if (hugetlb)
 		return ret;
 
@@ -1280,11 +1280,10 @@ static int get_any_page(struct page *p, unsigned long flags)
 
 static int __get_unpoison_page(struct page *page)
 {
-	struct page *head = compound_head(page);
 	int ret = 0;
 	bool hugetlb = false;
 
-	ret = get_hwpoison_huge_page(head, &hugetlb);
+	ret = get_hwpoison_huge_page(page, &hugetlb);
 	if (hugetlb)
 		return ret;
 
@@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	int res;
 	unsigned long page_flags;
 
-	if (TestSetPageHWPoison(head)) {
-		pr_err("Memory failure: %#lx: already hardware poisoned\n",
-		       pfn);
-		res = -EHWPOISON;
-		if (flags & MF_ACTION_REQUIRED)
-			res = kill_accessing_process(current, page_to_pfn(head), flags);
-		return res;
-	}
-
-	num_poisoned_pages_inc();
-
 	if (!(flags & MF_COUNT_INCREASED)) {
 		res = get_hwpoison_page(p, flags);
 		if (!res) {
 			lock_page(head);
 			if (hwpoison_filter(p)) {
-				if (TestClearPageHWPoison(head))
-					num_poisoned_pages_dec();
 				unlock_page(head);
 				return -EOPNOTSUPP;
 			}
@@ -1553,13 +1539,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	page_flags = head->flags;
 
 	if (hwpoison_filter(p)) {
-		if (TestClearPageHWPoison(head))
-			num_poisoned_pages_dec();
 		put_page(p);
 		res = -EOPNOTSUPP;
 		goto out;
 	}
 
+	if (TestSetPageHWPoison(head))
+		goto already_hwpoisoned;
+
+	num_poisoned_pages_inc();
+
 	/*
 	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
 	 * simply disable it. In order to make it work properly, we need
@@ -1585,6 +1574,14 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 out:
 	unlock_page(head);
 	return res;
+already_hwpoisoned:
+	put_page(p);
+	unlock_page(head);
+	pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
+	res = -EHWPOISON;
+	if (flags & MF_ACTION_REQUIRED)
+		res = kill_accessing_process(current, page_to_pfn(head), flags);
+	return res;
 }
 
 static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-14  2:13 [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
@ 2022-03-14  7:10 ` Miaohe Lin
  2022-03-14 18:41   ` Mike Kravetz
  2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 2 replies; 7+ messages in thread
From: Miaohe Lin @ 2022-03-14  7:10 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Mike Kravetz, Yang Shi, Naoya Horiguchi,
	linux-kernel, Linux-MM

On 2022/3/14 10:13, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> There is a race condition between memory_failure_hugetlb() and hugetlb
> free/demotion, which causes setting PageHWPoison flag on the wrong page
> (which was a hugetlb when memory_failure() was called, but was removed
> or demoted when memory_failure_hugetlb() is called).  This results in
> killing wrong processes.  So set PageHWPoison flag with holding page lock,

It seems hold page lock could not help solve this race condition as hugetlb
page demotion is not required to hold the page lock. Could you please explain
this a bit more?

BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?

> 
> The actual user-visible effect might be obscure because even if
> memory_failure() works as expected, some random process could be killed.
> Even worse, the actual error is left unhandled, so no one prevents later
> access to it, which might lead to more serious results like consuming
> corrupted data.
> 
> This patch depends on Miaohe's patch titled "mm/memory-failure.c: fix
> race with changing page compound again".
> 
> Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: <stable@vger.kernel.org>
> ---
> ChangeLog v1 -> v2:
> - pass subpage to get_hwpoison_huge_page() instead of head page.
> - call compound_head() in hugetlb_lock to avoid race with hugetlb
>   demotion/free.
> ---
>  mm/hugetlb.c        |  8 +++++---
>  mm/memory-failure.c | 33 +++++++++++++++------------------
>  2 files changed, 20 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f294db835f4b..345fed90842e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6761,14 +6761,16 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>  
>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>  {
> +	struct page *head;
>  	int ret = 0;
>  
>  	*hugetlb = false;
>  	spin_lock_irq(&hugetlb_lock);
> -	if (PageHeadHuge(page)) {
> +	head = compound_head(page);
> +	if (PageHeadHuge(head)) {
>  		*hugetlb = true;
> -		if (HPageFreed(page) || HPageMigratable(page))
> -			ret = get_page_unless_zero(page);
> +		if (HPageFreed(head) || HPageMigratable(head))
> +			ret = get_page_unless_zero(head);
>  		else
>  			ret = -EBUSY;
>  	}
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a9bfd04d2a3c..c40c00c3a261 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1193,7 +1193,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(page, &hugetlb);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1280,11 +1280,10 @@ static int get_any_page(struct page *p, unsigned long flags)
>  
>  static int __get_unpoison_page(struct page *page)
>  {
> -	struct page *head = compound_head(page);
>  	int ret = 0;
>  	bool hugetlb = false;
>  
> -	ret = get_hwpoison_huge_page(head, &hugetlb);
> +	ret = get_hwpoison_huge_page(page, &hugetlb);
>  	if (hugetlb)
>  		return ret;
>  
> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	int res;
>  	unsigned long page_flags;
>  
> -	if (TestSetPageHWPoison(head)) {
> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> -		       pfn);
> -		res = -EHWPOISON;
> -		if (flags & MF_ACTION_REQUIRED)
> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
> -		return res;
> -	}
> -
> -	num_poisoned_pages_inc();
> -
>  	if (!(flags & MF_COUNT_INCREASED)) {
>  		res = get_hwpoison_page(p, flags);
>  		if (!res) {

In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
page. Think about the below code in dissolve_free_huge_page:
	/*
	 * Move PageHWPoison flag from head page to the raw
	 * error page, which makes any subpages rather than
	 * the error page reusable.
	 */
	if (PageHWPoison(head) && page != head) {
		SetPageHWPoison(page);
		ClearPageHWPoison(head);
	}

SetPageHWPoison won't be called for the error page. Or am I miss something?

>  			lock_page(head);
>  			if (hwpoison_filter(p)) {
> -				if (TestClearPageHWPoison(head))
> -					num_poisoned_pages_dec();
>  				unlock_page(head);
>  				return -EOPNOTSUPP;
>  			}
> @@ -1553,13 +1539,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	page_flags = head->flags;
>  
>  	if (hwpoison_filter(p)) {
> -		if (TestClearPageHWPoison(head))
> -			num_poisoned_pages_dec();
>  		put_page(p);
>  		res = -EOPNOTSUPP;
>  		goto out;
>  	}
>  
> +	if (TestSetPageHWPoison(head))
> +		goto already_hwpoisoned;
> +
> +	num_poisoned_pages_inc();
> +
>  	/*
>  	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>  	 * simply disable it. In order to make it work properly, we need
> @@ -1585,6 +1574,14 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  out:
>  	unlock_page(head);
>  	return res;
> +already_hwpoisoned:
> +	put_page(p);
> +	unlock_page(head);

Generally speaking, we should do unlock_page before put_page or page might be disappeared
before we unlock the page. This should be ok when memory_failure succeeds to handle the
page previously as it holds one extra page refcnt. But it might be problematic when
memory_failure failed to handle the page last time. We might be the last user here.

> +	pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> +	res = -EHWPOISON;
> +	if (flags & MF_ACTION_REQUIRED)
> +		res = kill_accessing_process(current, page_to_pfn(head), flags);
> +	return res;
>  }
>  
>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> 

Many thanks for your path! :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-14  7:10 ` Miaohe Lin
@ 2022-03-14 18:41   ` Mike Kravetz
  2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-03-14 18:41 UTC (permalink / raw)
  To: Miaohe Lin, Naoya Horiguchi
  Cc: Andrew Morton, Yang Shi, Naoya Horiguchi, linux-kernel, Linux-MM

On 3/14/22 00:10, Miaohe Lin wrote:
> On 2022/3/14 10:13, Naoya Horiguchi wrote:
>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>
>> There is a race condition between memory_failure_hugetlb() and hugetlb
>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>> (which was a hugetlb when memory_failure() was called, but was removed
>> or demoted when memory_failure_hugetlb() is called).  This results in
>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
> 
> It seems hold page lock could not help solve this race condition as hugetlb
> page demotion is not required to hold the page lock. Could you please explain
> this a bit more?

I think it would be 'more safe' to SetPageHWPoison on hugetlb head page when
holding hugetlb_lock.  I know that does not fit with the way memory error
handling works in general.  Just a thought.

-- 
Mike Kravetz

> 
> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?
> 
>>
>> The actual user-visible effect might be obscure because even if
>> memory_failure() works as expected, some random process could be killed.
>> Even worse, the actual error is left unhandled, so no one prevents later
>> access to it, which might lead to more serious results like consuming
>> corrupted data.
>>
>> This patch depends on Miaohe's patch titled "mm/memory-failure.c: fix
>> race with changing page compound again".
>>
>> Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> ChangeLog v1 -> v2:
>> - pass subpage to get_hwpoison_huge_page() instead of head page.
>> - call compound_head() in hugetlb_lock to avoid race with hugetlb
>>   demotion/free.
>> ---
>>  mm/hugetlb.c        |  8 +++++---
>>  mm/memory-failure.c | 33 +++++++++++++++------------------
>>  2 files changed, 20 insertions(+), 21 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index f294db835f4b..345fed90842e 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -6761,14 +6761,16 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>>  
>>  int get_hwpoison_huge_page(struct page *page, bool *hugetlb)
>>  {
>> +	struct page *head;
>>  	int ret = 0;
>>  
>>  	*hugetlb = false;
>>  	spin_lock_irq(&hugetlb_lock);
>> -	if (PageHeadHuge(page)) {
>> +	head = compound_head(page);
>> +	if (PageHeadHuge(head)) {
>>  		*hugetlb = true;
>> -		if (HPageFreed(page) || HPageMigratable(page))
>> -			ret = get_page_unless_zero(page);
>> +		if (HPageFreed(head) || HPageMigratable(head))
>> +			ret = get_page_unless_zero(head);
>>  		else
>>  			ret = -EBUSY;
>>  	}
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index a9bfd04d2a3c..c40c00c3a261 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1193,7 +1193,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags)
>>  	int ret = 0;
>>  	bool hugetlb = false;
>>  
>> -	ret = get_hwpoison_huge_page(head, &hugetlb);
>> +	ret = get_hwpoison_huge_page(page, &hugetlb);
>>  	if (hugetlb)
>>  		return ret;
>>  
>> @@ -1280,11 +1280,10 @@ static int get_any_page(struct page *p, unsigned long flags)
>>  
>>  static int __get_unpoison_page(struct page *page)
>>  {
>> -	struct page *head = compound_head(page);
>>  	int ret = 0;
>>  	bool hugetlb = false;
>>  
>> -	ret = get_hwpoison_huge_page(head, &hugetlb);
>> +	ret = get_hwpoison_huge_page(page, &hugetlb);
>>  	if (hugetlb)
>>  		return ret;
>>  
>> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  	int res;
>>  	unsigned long page_flags;
>>  
>> -	if (TestSetPageHWPoison(head)) {
>> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>> -		       pfn);
>> -		res = -EHWPOISON;
>> -		if (flags & MF_ACTION_REQUIRED)
>> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
>> -		return res;
>> -	}
>> -
>> -	num_poisoned_pages_inc();
>> -
>>  	if (!(flags & MF_COUNT_INCREASED)) {
>>  		res = get_hwpoison_page(p, flags);
>>  		if (!res) {
> 
> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
> page. Think about the below code in dissolve_free_huge_page:
> 	/*
> 	 * Move PageHWPoison flag from head page to the raw
> 	 * error page, which makes any subpages rather than
> 	 * the error page reusable.
> 	 */
> 	if (PageHWPoison(head) && page != head) {
> 		SetPageHWPoison(page);
> 		ClearPageHWPoison(head);
> 	}
> 
> SetPageHWPoison won't be called for the error page. Or am I miss something?
> 
>>  			lock_page(head);
>>  			if (hwpoison_filter(p)) {
>> -				if (TestClearPageHWPoison(head))
>> -					num_poisoned_pages_dec();
>>  				unlock_page(head);
>>  				return -EOPNOTSUPP;
>>  			}
>> @@ -1553,13 +1539,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  	page_flags = head->flags;
>>  
>>  	if (hwpoison_filter(p)) {
>> -		if (TestClearPageHWPoison(head))
>> -			num_poisoned_pages_dec();
>>  		put_page(p);
>>  		res = -EOPNOTSUPP;
>>  		goto out;
>>  	}
>>  
>> +	if (TestSetPageHWPoison(head))
>> +		goto already_hwpoisoned;
>> +
>> +	num_poisoned_pages_inc();
>> +
>>  	/*
>>  	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>>  	 * simply disable it. In order to make it work properly, we need
>> @@ -1585,6 +1574,14 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  out:
>>  	unlock_page(head);
>>  	return res;
>> +already_hwpoisoned:
>> +	put_page(p);
>> +	unlock_page(head);
> 
> Generally speaking, we should do unlock_page before put_page or page might be disappeared
> before we unlock the page. This should be ok when memory_failure succeeds to handle the
> page previously as it holds one extra page refcnt. But it might be problematic when
> memory_failure failed to handle the page last time. We might be the last user here.
> 
>> +	pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
>> +	res = -EHWPOISON;
>> +	if (flags & MF_ACTION_REQUIRED)
>> +		res = kill_accessing_process(current, page_to_pfn(head), flags);
>> +	return res;
>>  }
>>  
>>  static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>
> 
> Many thanks for your path! :)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-14  7:10 ` Miaohe Lin
  2022-03-14 18:41   ` Mike Kravetz
@ 2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-15 14:00     ` Miaohe Lin
  1 sibling, 1 reply; 7+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-15  5:49 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Naoya Horiguchi, Andrew Morton, Mike Kravetz, Yang Shi,
	linux-kernel, Linux-MM

On Mon, Mar 14, 2022 at 03:10:25PM +0800, Miaohe Lin wrote:
> On 2022/3/14 10:13, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > There is a race condition between memory_failure_hugetlb() and hugetlb
> > free/demotion, which causes setting PageHWPoison flag on the wrong page
> > (which was a hugetlb when memory_failure() was called, but was removed
> > or demoted when memory_failure_hugetlb() is called).  This results in
> > killing wrong processes.  So set PageHWPoison flag with holding page lock,
> 
> It seems hold page lock could not help solve this race condition as hugetlb
> page demotion is not required to hold the page lock. Could you please explain
> this a bit more?

Sorry, the last line in the paragraph need change. What prevents the current
race is hugetlb_lock, not page lock.  The page lock is here to prevent the
race with hugepage allocation (not directly related to the current issue,
but it's still necessary).

> 
> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?

I should use a period here, I'll fix it.

[...]

> > @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >  	int res;
> >  	unsigned long page_flags;
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -		       pfn);
> > -		res = -EHWPOISON;
> > -		if (flags & MF_ACTION_REQUIRED)
> > -			res = kill_accessing_process(current, page_to_pfn(head), flags);
> > -		return res;
> > -	}
> > -
> > -	num_poisoned_pages_inc();
> > -
> >  	if (!(flags & MF_COUNT_INCREASED)) {
> >  		res = get_hwpoison_page(p, flags);
> >  		if (!res) {
> 
> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
> page. Think about the below code in dissolve_free_huge_page:
> 	/*
> 	 * Move PageHWPoison flag from head page to the raw
> 	 * error page, which makes any subpages rather than
> 	 * the error page reusable.
> 	 */
> 	if (PageHWPoison(head) && page != head) {
> 		SetPageHWPoison(page);
> 		ClearPageHWPoison(head);
> 	}
> 
> SetPageHWPoison won't be called for the error page. Or am I miss something?

No, you're right.  We need call page_handle_poison() instead of
__page_handle_poison().

@@ -1512,7 +1512,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 			}
 			unlock_page(head);
 			res = MF_FAILED;
-			if (__page_handle_poison(p)) {
+			if (page_handle_poison(p, true, false)) {
 				page_ref_inc(p);
 				res = MF_RECOVERED;
 			}



> 
> >  			lock_page(head);
> >  			if (hwpoison_filter(p)) {
> > -				if (TestClearPageHWPoison(head))
> > -					num_poisoned_pages_dec();
> >  				unlock_page(head);
> >  				return -EOPNOTSUPP;
> >  			}
> > @@ -1553,13 +1539,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >  	page_flags = head->flags;
> >  
> >  	if (hwpoison_filter(p)) {
> > -		if (TestClearPageHWPoison(head))
> > -			num_poisoned_pages_dec();
> >  		put_page(p);
> >  		res = -EOPNOTSUPP;
> >  		goto out;
> >  	}
> >  
> > +	if (TestSetPageHWPoison(head))
> > +		goto already_hwpoisoned;
> > +
> > +	num_poisoned_pages_inc();
> > +
> >  	/*
> >  	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> >  	 * simply disable it. In order to make it work properly, we need
> > @@ -1585,6 +1574,14 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >  out:
> >  	unlock_page(head);
> >  	return res;
> > +already_hwpoisoned:
> > +	put_page(p);
> > +	unlock_page(head);
> 
> Generally speaking, we should do unlock_page before put_page or page might be disappeared
> before we unlock the page. This should be ok when memory_failure succeeds to handle the
> page previously as it holds one extra page refcnt. But it might be problematic when
> memory_failure failed to handle the page last time. We might be the last user here.

OK, so another code path in "if (hwpoison_filter)@ block seems to need
the same change in the order.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-15 14:00     ` Miaohe Lin
  2022-03-16  0:33       ` Mike Kravetz
  0 siblings, 1 reply; 7+ messages in thread
From: Miaohe Lin @ 2022-03-15 14:00 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, Andrew Morton, Mike Kravetz, Yang Shi,
	linux-kernel, Linux-MM

On 2022/3/15 13:49, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Mar 14, 2022 at 03:10:25PM +0800, Miaohe Lin wrote:
>> On 2022/3/14 10:13, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> There is a race condition between memory_failure_hugetlb() and hugetlb
>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>>> (which was a hugetlb when memory_failure() was called, but was removed
>>> or demoted when memory_failure_hugetlb() is called).  This results in
>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>>
>> It seems hold page lock could not help solve this race condition as hugetlb
>> page demotion is not required to hold the page lock. Could you please explain
>> this a bit more?
> 
> Sorry, the last line in the paragraph need change. What prevents the current
> race is hugetlb_lock, not page lock.  The page lock is here to prevent the
> race with hugepage allocation (not directly related to the current issue,
> but it's still necessary).

Many thanks for clarifying this.

> 
>>
>> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?
> 
> I should use a period here, I'll fix it.
> 
> [...]
> 
>>> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>  	int res;
>>>  	unsigned long page_flags;
>>>  
>>> -	if (TestSetPageHWPoison(head)) {
>>> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>>> -		       pfn);
>>> -		res = -EHWPOISON;
>>> -		if (flags & MF_ACTION_REQUIRED)
>>> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
>>> -		return res;
>>> -	}
>>> -
>>> -	num_poisoned_pages_inc();
>>> -
>>>  	if (!(flags & MF_COUNT_INCREASED)) {
>>>  		res = get_hwpoison_page(p, flags);
>>>  		if (!res) {
>>
>> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
>> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
>> page. Think about the below code in dissolve_free_huge_page:
>> 	/*
>> 	 * Move PageHWPoison flag from head page to the raw
>> 	 * error page, which makes any subpages rather than
>> 	 * the error page reusable.
>> 	 */
>> 	if (PageHWPoison(head) && page != head) {
>> 		SetPageHWPoison(page);
>> 		ClearPageHWPoison(head);
>> 	}
>>
>> SetPageHWPoison won't be called for the error page. Or am I miss something?
> 
> No, you're right.  We need call page_handle_poison() instead of
> __page_handle_poison().
> 
> @@ -1512,7 +1512,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  			}
>  			unlock_page(head);
>  			res = MF_FAILED;
> -			if (__page_handle_poison(p)) {
> +			if (page_handle_poison(p, true, false)) {
>  				page_ref_inc(p);
>  				res = MF_RECOVERED;
>  			}
> 

This one looks good to me.

> 
> 
>>
>>>  			lock_page(head);
>>>  			if (hwpoison_filter(p)) {
>>> -				if (TestClearPageHWPoison(head))
>>> -					num_poisoned_pages_dec();
>>>  				unlock_page(head);
>>>  				return -EOPNOTSUPP;
>>>  			}
>>> @@ -1553,13 +1539,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>  	page_flags = head->flags;
>>>  
>>>  	if (hwpoison_filter(p)) {
>>> -		if (TestClearPageHWPoison(head))
>>> -			num_poisoned_pages_dec();
>>>  		put_page(p);
>>>  		res = -EOPNOTSUPP;
>>>  		goto out;
>>>  	}
>>>  
>>> +	if (TestSetPageHWPoison(head))
>>> +		goto already_hwpoisoned;
>>> +
>>> +	num_poisoned_pages_inc();
>>> +
>>>  	/*
>>>  	 * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>>>  	 * simply disable it. In order to make it work properly, we need
>>> @@ -1585,6 +1574,14 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>  out:
>>>  	unlock_page(head);
>>>  	return res;
>>> +already_hwpoisoned:
>>> +	put_page(p);
>>> +	unlock_page(head);
>>
>> Generally speaking, we should do unlock_page before put_page or page might be disappeared
>> before we unlock the page. This should be ok when memory_failure succeeds to handle the
>> page previously as it holds one extra page refcnt. But it might be problematic when
>> memory_failure failed to handle the page last time. We might be the last user here.
> 
> OK, so another code path in "if (hwpoison_filter)@ block seems to need
> the same change in the order.

You're right.

> 
> Thanks,
> Naoya Horiguchi
> 

Many thanks for your patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-15 14:00     ` Miaohe Lin
@ 2022-03-16  0:33       ` Mike Kravetz
  2022-03-16  1:00         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2022-03-16  0:33 UTC (permalink / raw)
  To: Miaohe Lin, HORIGUCHI NAOYA(堀口 直也)
  Cc: Naoya Horiguchi, Andrew Morton, Yang Shi, linux-kernel, Linux-MM

On 3/15/22 07:00, Miaohe Lin wrote:
> On 2022/3/15 13:49, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Mon, Mar 14, 2022 at 03:10:25PM +0800, Miaohe Lin wrote:
>>> On 2022/3/14 10:13, Naoya Horiguchi wrote:
>>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>
>>>> There is a race condition between memory_failure_hugetlb() and hugetlb
>>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
>>>> (which was a hugetlb when memory_failure() was called, but was removed
>>>> or demoted when memory_failure_hugetlb() is called).  This results in
>>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
>>>
>>> It seems hold page lock could not help solve this race condition as hugetlb
>>> page demotion is not required to hold the page lock. Could you please explain
>>> this a bit more?
>>
>> Sorry, the last line in the paragraph need change. What prevents the current
>> race is hugetlb_lock, not page lock.  The page lock is here to prevent the
>> race with hugepage allocation (not directly related to the current issue,
>> but it's still necessary).
> 
> Many thanks for clarifying this.
> 
>>
>>>
>>> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?
>>
>> I should use a period here, I'll fix it.
>>
>> [...]
>>
>>>> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>  	int res;
>>>>  	unsigned long page_flags;
>>>>  
>>>> -	if (TestSetPageHWPoison(head)) {
>>>> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>>>> -		       pfn);
>>>> -		res = -EHWPOISON;
>>>> -		if (flags & MF_ACTION_REQUIRED)
>>>> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
>>>> -		return res;
>>>> -	}
>>>> -
>>>> -	num_poisoned_pages_inc();
>>>> -
>>>>  	if (!(flags & MF_COUNT_INCREASED)) {
>>>>  		res = get_hwpoison_page(p, flags);
>>>>  		if (!res) {
>>>
>>> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
>>> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
>>> page. Think about the below code in dissolve_free_huge_page:
>>> 	/*
>>> 	 * Move PageHWPoison flag from head page to the raw
>>> 	 * error page, which makes any subpages rather than
>>> 	 * the error page reusable.
>>> 	 */
>>> 	if (PageHWPoison(head) && page != head) {
>>> 		SetPageHWPoison(page);
>>> 		ClearPageHWPoison(head);
>>> 	}
>>>
>>> SetPageHWPoison won't be called for the error page. Or am I miss something?
>>
>> No, you're right.  We need call page_handle_poison() instead of
>> __page_handle_poison().
>>
>> @@ -1512,7 +1512,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  			}
>>  			unlock_page(head);
>>  			res = MF_FAILED;
>> -			if (__page_handle_poison(p)) {
>> +			if (page_handle_poison(p, true, false)) {
>>  				page_ref_inc(p);
>>  				res = MF_RECOVERED;
>>  			}
>>
> 
> This one looks good to me.

I must be missing something.  It seems page_handle_poison() calls
__page_handle_poison and thus dissolve_free_huge_page before
SetPageHWPoison.

I could easily be missing some patches, but that is the order of calls
in the code I am looking at.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()
  2022-03-16  0:33       ` Mike Kravetz
@ 2022-03-16  1:00         ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 7+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-16  1:00 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Yang Shi,
	linux-kernel, Linux-MM

On Tue, Mar 15, 2022 at 05:33:43PM -0700, Mike Kravetz wrote:
> On 3/15/22 07:00, Miaohe Lin wrote:
> > On 2022/3/15 13:49, HORIGUCHI NAOYA(堀口 直也) wrote:
> >> On Mon, Mar 14, 2022 at 03:10:25PM +0800, Miaohe Lin wrote:
> >>> On 2022/3/14 10:13, Naoya Horiguchi wrote:
> >>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >>>>
> >>>> There is a race condition between memory_failure_hugetlb() and hugetlb
> >>>> free/demotion, which causes setting PageHWPoison flag on the wrong page
> >>>> (which was a hugetlb when memory_failure() was called, but was removed
> >>>> or demoted when memory_failure_hugetlb() is called).  This results in
> >>>> killing wrong processes.  So set PageHWPoison flag with holding page lock,
> >>>
> >>> It seems hold page lock could not help solve this race condition as hugetlb
> >>> page demotion is not required to hold the page lock. Could you please explain
> >>> this a bit more?
> >>
> >> Sorry, the last line in the paragraph need change. What prevents the current
> >> race is hugetlb_lock, not page lock.  The page lock is here to prevent the
> >> race with hugepage allocation (not directly related to the current issue,
> >> but it's still necessary).
> > 
> > Many thanks for clarifying this.
> > 
> >>
> >>>
> >>> BTW:Is there some words missing or here should be 'page lock.' instead of 'page lock,' ?
> >>
> >> I should use a period here, I'll fix it.
> >>
> >> [...]
> >>
> >>>> @@ -1503,24 +1502,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>>>  	int res;
> >>>>  	unsigned long page_flags;
> >>>>  
> >>>> -	if (TestSetPageHWPoison(head)) {
> >>>> -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> >>>> -		       pfn);
> >>>> -		res = -EHWPOISON;
> >>>> -		if (flags & MF_ACTION_REQUIRED)
> >>>> -			res = kill_accessing_process(current, page_to_pfn(head), flags);
> >>>> -		return res;
> >>>> -	}
> >>>> -
> >>>> -	num_poisoned_pages_inc();
> >>>> -
> >>>>  	if (!(flags & MF_COUNT_INCREASED)) {
> >>>>  		res = get_hwpoison_page(p, flags);
> >>>>  		if (!res) {
> >>>
> >>> In this (res == 0) case, hugetlb page could be dissolved via __page_handle_poison.
> >>> But since PageHWPoison is not set yet, we can't set the PageHWPoison to the correct
> >>> page. Think about the below code in dissolve_free_huge_page:
> >>> 	/*
> >>> 	 * Move PageHWPoison flag from head page to the raw
> >>> 	 * error page, which makes any subpages rather than
> >>> 	 * the error page reusable.
> >>> 	 */
> >>> 	if (PageHWPoison(head) && page != head) {
> >>> 		SetPageHWPoison(page);
> >>> 		ClearPageHWPoison(head);
> >>> 	}
> >>>
> >>> SetPageHWPoison won't be called for the error page. Or am I miss something?
> >>
> >> No, you're right.  We need call page_handle_poison() instead of
> >> __page_handle_poison().
> >>
> >> @@ -1512,7 +1512,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>  			}
> >>  			unlock_page(head);
> >>  			res = MF_FAILED;
> >> -			if (__page_handle_poison(p)) {
> >> +			if (page_handle_poison(p, true, false)) {
> >>  				page_ref_inc(p);
> >>  				res = MF_RECOVERED;
> >>  			}
> >>
> > 
> > This one looks good to me.
> 
> I must be missing something.  It seems page_handle_poison() calls
> __page_handle_poison and thus dissolve_free_huge_page before
> SetPageHWPoison.
> 
> I could easily be missing some patches, but that is the order of calls
> in the code I am looking at.

Sorry for my lack of words, maybe the situation is a little complicated.
page_handle_poison() can be called both before and after SetPageHWPoison.
"before SetPageHWPoison" case is when called from memory_failure() for
in-use pages. "after SetPageHWPoison" case is when called from soft offline.
So the above change is intended to move "memory_failure() for free hugepage"
from "before SetPageHWPoison" case to "after SetPageHWPoison" case.

BTW, I found myself the issue in the above diff, I have to remove the
page_ref_inc() in it because page_handle_poison() calls it inside it.
I'll share the updated patch soon.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-16  1:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14  2:13 [PATCH v2] mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb() Naoya Horiguchi
2022-03-14  7:10 ` Miaohe Lin
2022-03-14 18:41   ` Mike Kravetz
2022-03-15  5:49   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-15 14:00     ` Miaohe Lin
2022-03-16  0:33       ` Mike Kravetz
2022-03-16  1:00         ` HORIGUCHI NAOYA(堀口 直也)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).