All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] A few fixup patches for memory failure
@ 2022-03-12  7:46 Miaohe Lin
  2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-12  7:46 UTC (permalink / raw)
  To: akpm, tony.luck, bp, naoya.horiguchi
  Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe

Hi everyone,
This series contains a few patches to fix the race with changing page
compound page, make non-LRU movable pages unhandlable and so on. More
details can be found in the respective changelogs. Thanks!

---
v1->v2:
  drop "mm/memory-failure.c: fix wrong user reference report"
  make non-LRU movable pages unhandlable
  fix confusing commit log and introduce MF_MSG_DIFFERENT_PAGE_SIZE
  Many thanks Naoya, Mike and Yang Shi for review!
---

Miaohe Lin (3):
  mm/memory-failure.c: fix race with changing page compound again
  mm/memory-failure.c: avoid calling invalidate_inode_page() with
    unexpected pages
  mm/memory-failure.c: make non-LRU movable pages unhandlable

 include/linux/mm.h      |  1 +
 include/ras/ras_event.h |  1 +
 mm/memory-failure.c     | 34 ++++++++++++++++++++++++++--------
 3 files changed, 28 insertions(+), 8 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-12  7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin
@ 2022-03-12  7:46 ` Miaohe Lin
  2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14 18:20   ` Mike Kravetz
  2022-03-12  7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin
  2022-03-12  7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin
  2 siblings, 2 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-12  7:46 UTC (permalink / raw)
  To: akpm, tony.luck, bp, naoya.horiguchi
  Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe

There is a race window where we got the compound_head, the hugetlb page
could be freed to buddy, or even changed to another compound page just
before we try to get hwpoison page. Think about the below race window:
  CPU 1					  CPU 2
  memory_failure_hugetlb
  struct page *head = compound_head(p);
					  hugetlb page might be freed to
					  buddy, or even changed to another
					  compound page.

  get_hwpoison_page -- page is not what we want now...

If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
introduced to record this event.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 include/linux/mm.h      |  1 +
 include/ras/ras_event.h |  1 +
 mm/memory-failure.c     | 12 ++++++++++++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c9bada4096ac..ef98cff2b253 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3253,6 +3253,7 @@ enum mf_action_page_type {
 	MF_MSG_BUDDY,
 	MF_MSG_DAX,
 	MF_MSG_UNSPLIT_THP,
+	MF_MSG_DIFFERENT_PAGE_SIZE,
 	MF_MSG_UNKNOWN,
 };
 
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index d0337a41141c..1e694fd239b9 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
 	EM ( MF_MSG_BUDDY, "free buddy page" )				\
 	EM ( MF_MSG_DAX, "dax page" )					\
 	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
+	EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )	\
 	EMe ( MF_MSG_UNKNOWN, "unknown page" )
 
 /*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5444a8ef4867..dabecd87ad3f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
 	[MF_MSG_BUDDY]			= "free buddy page",
 	[MF_MSG_DAX]			= "dax page",
 	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
+	[MF_MSG_DIFFERENT_PAGE_SIZE]	= "different page size",
 	[MF_MSG_UNKNOWN]		= "unknown page",
 };
 
@@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	}
 
 	lock_page(head);
+
+	/**
+	 * The page could have changed compound pages due to race window.
+	 * If this happens just bail out.
+	 */
+	if (!PageHuge(p) || compound_head(p) != head) {
+		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
+		res = -EBUSY;
+		goto out;
+	}
+
 	page_flags = head->flags;
 
 	if (hwpoison_filter(p)) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-12  7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin
  2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
@ 2022-03-12  7:46 ` Miaohe Lin
  2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-12  7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin
  2 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-03-12  7:46 UTC (permalink / raw)
  To: akpm, tony.luck, bp, naoya.horiguchi
  Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe

Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
into its one caller"), invalidate_inode_page() can invalidate the pages in
the swap cache because the check of page->mapping != mapping is removed.
But invalidate_inode_page() is not expected to deal with the pages in swap
cache. Also non-lru movable page can reach here too. They're not page cache
pages. Skip these pages by checking PageSwapCache and PageLRU.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index dabecd87ad3f..2ff7dd2078c4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page)
 		return 0;
 	}
 
-	if (!PageHuge(page))
+	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
 		/*
 		 * Try to invalidate first. This should work for
 		 * non dirty unmapped page cache pages.
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable
  2022-03-12  7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin
  2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
  2022-03-12  7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin
@ 2022-03-12  7:46 ` Miaohe Lin
  2022-03-13 23:43   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14 17:34   ` Yang Shi
  2 siblings, 2 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-12  7:46 UTC (permalink / raw)
  To: akpm, tony.luck, bp, naoya.horiguchi
  Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe

We can not really handle non-LRU movable pages in memory failure. Typically
they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU
movable page, we could reach as far as identify_page_state(), it should not
fall into any category except me_unknown. For the non-LRU compound movable
pages, they could be taken for transhuge pages but it's unexpected to split
non-LRU  movable pages using split_huge_page_to_list in memory_failure. So
we could just simply make non-LRU  movable pages unhandlable to avoid these
possible nasty cases.

Suggested-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 2ff7dd2078c4..ba621c6823ed 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1177,12 +1177,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
  * does not return true for hugetlb or device memory pages, so it's assumed
  * to be called only in the context where we never have such pages.
  */
-static inline bool HWPoisonHandlable(struct page *page)
+static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 {
-	return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
+	bool movable = false;
+
+	/* Soft offline could mirgate non-LRU movable pages */
+	if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
+		movable = true;
+
+	return movable || PageLRU(page) || is_free_buddy_page(page);
 }
 
-static int __get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page, unsigned long flags)
 {
 	struct page *head = compound_head(page);
 	int ret = 0;
@@ -1197,7 +1203,7 @@ static int __get_hwpoison_page(struct page *page)
 	 * for any unsupported type of page in order to reduce the risk of
 	 * unexpected races caused by taking a page refcount.
 	 */
-	if (!HWPoisonHandlable(head))
+	if (!HWPoisonHandlable(head, flags))
 		return -EBUSY;
 
 	if (get_page_unless_zero(head)) {
@@ -1222,7 +1228,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 
 try_again:
 	if (!count_increased) {
-		ret = __get_hwpoison_page(p);
+		ret = __get_hwpoison_page(p, flags);
 		if (!ret) {
 			if (page_count(p)) {
 				/* We raced with an allocation, retry. */
@@ -1250,7 +1256,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 		}
 	}
 
-	if (PageHuge(p) || HWPoisonHandlable(p)) {
+	if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
 		ret = 1;
 	} else {
 		/*
@@ -2308,7 +2314,7 @@ int soft_offline_page(unsigned long pfn, int flags)
 
 retry:
 	get_online_mems();
-	ret = get_hwpoison_page(page, flags);
+	ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
 	put_online_mems();
 
 	if (ret > 0) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
@ 2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14  1:51     ` Miaohe Lin
  2022-03-14 18:20   ` Mike Kravetz
  1 sibling, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:41 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On Sat, Mar 12, 2022 at 03:46:11PM +0800, Miaohe Lin wrote:
> There is a race window where we got the compound_head, the hugetlb page
> could be freed to buddy, or even changed to another compound page just
> before we try to get hwpoison page. Think about the below race window:
>   CPU 1					  CPU 2
>   memory_failure_hugetlb
>   struct page *head = compound_head(p);
> 					  hugetlb page might be freed to
> 					  buddy, or even changed to another
> 					  compound page.
> 
>   get_hwpoison_page -- page is not what we want now...
> 
> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
> introduced to record this event.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/mm.h      |  1 +
>  include/ras/ras_event.h |  1 +
>  mm/memory-failure.c     | 12 ++++++++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c9bada4096ac..ef98cff2b253 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
>  	MF_MSG_BUDDY,
>  	MF_MSG_DAX,
>  	MF_MSG_UNSPLIT_THP,
> +	MF_MSG_DIFFERENT_PAGE_SIZE,
>  	MF_MSG_UNKNOWN,
>  };
>  
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index d0337a41141c..1e694fd239b9 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
>  	EM ( MF_MSG_BUDDY, "free buddy page" )				\
>  	EM ( MF_MSG_DAX, "dax page" )					\
>  	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
> +	EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )	\
>  	EMe ( MF_MSG_UNKNOWN, "unknown page" )
>  
>  /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..dabecd87ad3f 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
>  	[MF_MSG_BUDDY]			= "free buddy page",
>  	[MF_MSG_DAX]			= "dax page",
>  	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
> +	[MF_MSG_DIFFERENT_PAGE_SIZE]	= "different page size",
>  	[MF_MSG_UNKNOWN]		= "unknown page",
>  };
>  
> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	}
>  
>  	lock_page(head);
> +
> +	/**

This comment section starting with '/**' is considered as kernel-doc comment,
maybe that's not expected because it just describes an implementation detail.
So normal comment section with '/*' would be better.

Otherwise, looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

> +	 * The page could have changed compound pages due to race window.
> +	 * If this happens just bail out.
> +	 */
> +	if (!PageHuge(p) || compound_head(p) != head) {
> +		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> +		res = -EBUSY;
> +		goto out;
> +	}
> +
>  	page_flags = head->flags;
>  
>  	if (hwpoison_filter(p)) {
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-12  7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin
@ 2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14  1:58     ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:41 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()

This commit ID does not exist in mainline (or in the latest mmotm?),
so you can't use it in patch description.  Could you update this part?

Thanks,
Naoya Horiguchi

> into its one caller"), invalidate_inode_page() can invalidate the pages in
> the swap cache because the check of page->mapping != mapping is removed.
> But invalidate_inode_page() is not expected to deal with the pages in swap
> cache. Also non-lru movable page can reach here too. They're not page cache
> pages. Skip these pages by checking PageSwapCache and PageLRU.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index dabecd87ad3f..2ff7dd2078c4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page)
>  		return 0;
>  	}
>  
> -	if (!PageHuge(page))
> +	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
>  		/*
>  		 * Try to invalidate first. This should work for
>  		 * non dirty unmapped page cache pages.
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable
  2022-03-12  7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin
@ 2022-03-13 23:43   ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14 17:34   ` Yang Shi
  1 sibling, 0 replies; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:43 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On Sat, Mar 12, 2022 at 03:46:13PM +0800, Miaohe Lin wrote:
> We can not really handle non-LRU movable pages in memory failure. Typically
> they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU
> movable page, we could reach as far as identify_page_state(), it should not
> fall into any category except me_unknown. For the non-LRU compound movable
> pages, they could be taken for transhuge pages but it's unexpected to split
> non-LRU  movable pages using split_huge_page_to_list in memory_failure. So
> we could just simply make non-LRU  movable pages unhandlable to avoid these
> possible nasty cases.
> 
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-14  1:51     ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-14  1:51 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sat, Mar 12, 2022 at 03:46:11PM +0800, Miaohe Lin wrote:
>> There is a race window where we got the compound_head, the hugetlb page
>> could be freed to buddy, or even changed to another compound page just
>> before we try to get hwpoison page. Think about the below race window:
>>   CPU 1					  CPU 2
>>   memory_failure_hugetlb
>>   struct page *head = compound_head(p);
>> 					  hugetlb page might be freed to
>> 					  buddy, or even changed to another
>> 					  compound page.
>>
>>   get_hwpoison_page -- page is not what we want now...
>>
>> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
>> introduced to record this event.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/mm.h      |  1 +
>>  include/ras/ras_event.h |  1 +
>>  mm/memory-failure.c     | 12 ++++++++++++
>>  3 files changed, 14 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index c9bada4096ac..ef98cff2b253 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
>>  	MF_MSG_BUDDY,
>>  	MF_MSG_DAX,
>>  	MF_MSG_UNSPLIT_THP,
>> +	MF_MSG_DIFFERENT_PAGE_SIZE,
>>  	MF_MSG_UNKNOWN,
>>  };
>>  
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index d0337a41141c..1e694fd239b9 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
>>  	EM ( MF_MSG_BUDDY, "free buddy page" )				\
>>  	EM ( MF_MSG_DAX, "dax page" )					\
>>  	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
>> +	EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )	\
>>  	EMe ( MF_MSG_UNKNOWN, "unknown page" )
>>  
>>  /*
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 5444a8ef4867..dabecd87ad3f 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
>>  	[MF_MSG_BUDDY]			= "free buddy page",
>>  	[MF_MSG_DAX]			= "dax page",
>>  	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
>> +	[MF_MSG_DIFFERENT_PAGE_SIZE]	= "different page size",
>>  	[MF_MSG_UNKNOWN]		= "unknown page",
>>  };
>>  
>> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  	}
>>  
>>  	lock_page(head);
>> +
>> +	/**
> 
> This comment section starting with '/**' is considered as kernel-doc comment,
> maybe that's not expected because it just describes an implementation detail.
> So normal comment section with '/*' would be better.
> 

I see. Will change to use "/*".

> Otherwise, looks good to me.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Many thanks for review and comment.

> 
>> +	 * The page could have changed compound pages due to race window.
>> +	 * If this happens just bail out.
>> +	 */
>> +	if (!PageHuge(p) || compound_head(p) != head) {
>> +		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
>> +		res = -EBUSY;
>> +		goto out;
>> +	}
>> +
>>  	page_flags = head->flags;
>>  
>>  	if (hwpoison_filter(p)) {
>> -- 
>> 2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-14  1:58     ` Miaohe Lin
  2022-03-14  2:50       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-03-14  1:58 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
> 
> This commit ID does not exist in mainline (or in the latest mmotm?),
> so you can't use it in patch description.  Could you update this part?
> 

This commit is in the mmotm but not in mainline yet:

commit 042c4f32323beb28146c658202d3e69899e4f245
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Feb 12 15:27:42 2022 -0500

    mm/truncate: Inline invalidate_complete_page() into its one caller

    invalidate_inode_page() is the only caller of invalidate_complete_page()
    and inlining it reveals that the first check is unnecessary (because we
    hold the page locked, and we just retrieved the mapping from the page).
    Actually, it does make a difference, in that tail pages no longer fail
    at this check, so it's now possible to remove a tail page from a mapping.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>


Am I "not" supposed to use this commit id as it's not "stable" now?

Will update this part in next version. Many thanks.

> Thanks,
> Naoya Horiguchi
> 
>> into its one caller"), invalidate_inode_page() can invalidate the pages in
>> the swap cache because the check of page->mapping != mapping is removed.
>> But invalidate_inode_page() is not expected to deal with the pages in swap
>> cache. Also non-lru movable page can reach here too. They're not page cache
>> pages. Skip these pages by checking PageSwapCache and PageLRU.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index dabecd87ad3f..2ff7dd2078c4 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page)
>>  		return 0;
>>  	}
>>  
>> -	if (!PageHuge(page))
>> +	if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page))
>>  		/*
>>  		 * Try to invalidate first. This should work for
>>  		 * non dirty unmapped page cache pages.
>> -- 
>> 2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-14  1:58     ` Miaohe Lin
@ 2022-03-14  2:50       ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-14  2:59         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14  2:50 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote:
> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
> >> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
> > 
> > This commit ID does not exist in mainline (or in the latest mmotm?),
> > so you can't use it in patch description.  Could you update this part?
> > 
> 
> This commit is in the mmotm but not in mainline yet:
> 
> commit 042c4f32323beb28146c658202d3e69899e4f245
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Sat Feb 12 15:27:42 2022 -0500
> 
>     mm/truncate: Inline invalidate_complete_page() into its one caller
> 
>     invalidate_inode_page() is the only caller of invalidate_complete_page()
>     and inlining it reveals that the first check is unnecessary (because we
>     hold the page locked, and we just retrieved the mapping from the page).
>     Actually, it does make a difference, in that tail pages no longer fail
>     at this check, so it's now possible to remove a tail page from a mapping.
> 
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>     Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Am I "not" supposed to use this commit id as it's not "stable" now?

No, it's not stable yet. In whatever way you get the above commit (I guess
you get it from https://github.com/hnaz/linux-mm), all acked mm-related
patches are sent to Linus by Andrew *by email*, so the eventual commit IDs
should be determined when they are applied to mainline.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-14  2:50       ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-14  2:59         ` Miaohe Lin
  2022-03-14 23:45           ` Andrew Morton
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-03-14  2:59 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm,
	linux-kernel, linux-edac

On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote:
>> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
>>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
>>>
>>> This commit ID does not exist in mainline (or in the latest mmotm?),
>>> so you can't use it in patch description.  Could you update this part?
>>>
>>
>> This commit is in the mmotm but not in mainline yet:
>>
>> commit 042c4f32323beb28146c658202d3e69899e4f245
>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Date:   Sat Feb 12 15:27:42 2022 -0500
>>
>>     mm/truncate: Inline invalidate_complete_page() into its one caller
>>
>>     invalidate_inode_page() is the only caller of invalidate_complete_page()
>>     and inlining it reveals that the first check is unnecessary (because we
>>     hold the page locked, and we just retrieved the mapping from the page).
>>     Actually, it does make a difference, in that tail pages no longer fail
>>     at this check, so it's now possible to remove a tail page from a mapping.
>>
>>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>     Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>>
>> Am I "not" supposed to use this commit id as it's not "stable" now?
> 
> No, it's not stable yet. In whatever way you get the above commit (I guess
> you get it from https://github.com/hnaz/linux-mm), all acked mm-related
> patches are sent to Linus by Andrew *by email*, so the eventual commit IDs
> should be determined when they are applied to mainline.
> 

Many thanks for your explanation. (I get this commit id from linux-next tree:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git)
So I should remember always to get the commit id from mainline.

Thanks again. :)

> Thanks,
> Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable
  2022-03-12  7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin
  2022-03-13 23:43   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-14 17:34   ` Yang Shi
  1 sibling, 0 replies; 20+ messages in thread
From: Yang Shi @ 2022-03-14 17:34 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, tony.luck, bp, naoya.horiguchi, mike.kravetz, linux-mm,
	linux-kernel, linux-edac

On Fri, Mar 11, 2022 at 11:47 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> We can not really handle non-LRU movable pages in memory failure. Typically
> they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU
> movable page, we could reach as far as identify_page_state(), it should not
> fall into any category except me_unknown. For the non-LRU compound movable
> pages, they could be taken for transhuge pages but it's unexpected to split
> non-LRU  movable pages using split_huge_page_to_list in memory_failure. So
> we could just simply make non-LRU  movable pages unhandlable to avoid these
> possible nasty cases.
>
> Suggested-by: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Reviewed-by: Yang Shi <shy828301@gmail.com>

> ---
>  mm/memory-failure.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2ff7dd2078c4..ba621c6823ed 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1177,12 +1177,18 @@ void ClearPageHWPoisonTakenOff(struct page *page)
>   * does not return true for hugetlb or device memory pages, so it's assumed
>   * to be called only in the context where we never have such pages.
>   */
> -static inline bool HWPoisonHandlable(struct page *page)
> +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
>  {
> -       return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page);
> +       bool movable = false;
> +
> +       /* Soft offline could mirgate non-LRU movable pages */
> +       if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
> +               movable = true;
> +
> +       return movable || PageLRU(page) || is_free_buddy_page(page);
>  }
>
> -static int __get_hwpoison_page(struct page *page)
> +static int __get_hwpoison_page(struct page *page, unsigned long flags)
>  {
>         struct page *head = compound_head(page);
>         int ret = 0;
> @@ -1197,7 +1203,7 @@ static int __get_hwpoison_page(struct page *page)
>          * for any unsupported type of page in order to reduce the risk of
>          * unexpected races caused by taking a page refcount.
>          */
> -       if (!HWPoisonHandlable(head))
> +       if (!HWPoisonHandlable(head, flags))
>                 return -EBUSY;
>
>         if (get_page_unless_zero(head)) {
> @@ -1222,7 +1228,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>
>  try_again:
>         if (!count_increased) {
> -               ret = __get_hwpoison_page(p);
> +               ret = __get_hwpoison_page(p, flags);
>                 if (!ret) {
>                         if (page_count(p)) {
>                                 /* We raced with an allocation, retry. */
> @@ -1250,7 +1256,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>                 }
>         }
>
> -       if (PageHuge(p) || HWPoisonHandlable(p)) {
> +       if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
>                 ret = 1;
>         } else {
>                 /*
> @@ -2308,7 +2314,7 @@ int soft_offline_page(unsigned long pfn, int flags)
>
>  retry:
>         get_online_mems();
> -       ret = get_hwpoison_page(page, flags);
> +       ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE);
>         put_online_mems();
>
>         if (ret > 0) {
> --
> 2.23.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
  2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-14 18:20   ` Mike Kravetz
  2022-03-15 14:19     ` Miaohe Lin
  1 sibling, 1 reply; 20+ messages in thread
From: Mike Kravetz @ 2022-03-14 18:20 UTC (permalink / raw)
  To: Miaohe Lin, akpm, tony.luck, bp, naoya.horiguchi
  Cc: shy828301, linux-mm, linux-kernel, linux-edac

On 3/11/22 23:46, Miaohe Lin wrote:
> There is a race window where we got the compound_head, the hugetlb page
> could be freed to buddy, or even changed to another compound page just
> before we try to get hwpoison page. Think about the below race window:
>   CPU 1					  CPU 2
>   memory_failure_hugetlb
>   struct page *head = compound_head(p);
> 					  hugetlb page might be freed to
> 					  buddy, or even changed to another
> 					  compound page.
> 
>   get_hwpoison_page -- page is not what we want now...
> 
> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
> introduced to record this event.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  include/linux/mm.h      |  1 +
>  include/ras/ras_event.h |  1 +
>  mm/memory-failure.c     | 12 ++++++++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c9bada4096ac..ef98cff2b253 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
>  	MF_MSG_BUDDY,
>  	MF_MSG_DAX,
>  	MF_MSG_UNSPLIT_THP,
> +	MF_MSG_DIFFERENT_PAGE_SIZE,
>  	MF_MSG_UNKNOWN,
>  };
>  
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index d0337a41141c..1e694fd239b9 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
>  	EM ( MF_MSG_BUDDY, "free buddy page" )				\
>  	EM ( MF_MSG_DAX, "dax page" )					\
>  	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
> +	EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )	\
>  	EMe ( MF_MSG_UNKNOWN, "unknown page" )
>  
>  /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 5444a8ef4867..dabecd87ad3f 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
>  	[MF_MSG_BUDDY]			= "free buddy page",
>  	[MF_MSG_DAX]			= "dax page",
>  	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
> +	[MF_MSG_DIFFERENT_PAGE_SIZE]	= "different page size",
>  	[MF_MSG_UNKNOWN]		= "unknown page",
>  };
>  
> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	}
>  
>  	lock_page(head);
> +
> +	/**
> +	 * The page could have changed compound pages due to race window.
> +	 * If this happens just bail out.
> +	 */
> +	if (!PageHuge(p) || compound_head(p) != head) {
> +		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> +		res = -EBUSY;

We have discussed this race in other versions of the patch.  When we encounter
the race, we have likely marked poison on the wrong page.  Correct?

Instead of printing a "different page size", would it be better to perhaps:
- Print a message that wrong page may be marked for poison?
- Clear the poison flag in the "head page" previously set?

-- 
Mike Kravetz

> +		goto out;
> +	}
> +
>  	page_flags = head->flags;
>  
>  	if (hwpoison_filter(p)) {



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-14  2:59         ` Miaohe Lin
@ 2022-03-14 23:45           ` Andrew Morton
  2022-03-15 13:55             ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2022-03-14 23:45 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: HORIGUCHI NAOYA, tony.luck, bp, mike.kravetz, shy828301,
	linux-mm, linux-kernel, linux-edac

On Mon, 14 Mar 2022 10:59:40 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote:
> >> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
> >>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
> >>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
> >>>
> >>> This commit ID does not exist in mainline (or in the latest mmotm?),
> >>> so you can't use it in patch description.  Could you update this part?
> >>>
> >>
> >> This commit is in the mmotm but not in mainline yet:
> >>
> >> commit 042c4f32323beb28146c658202d3e69899e4f245
> >> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> Date:   Sat Feb 12 15:27:42 2022 -0500
> >>
> >>     mm/truncate: Inline invalidate_complete_page() into its one caller
> >>
> >>     invalidate_inode_page() is the only caller of invalidate_complete_page()
> >>     and inlining it reveals that the first check is unnecessary (because we
> >>     hold the page locked, and we just retrieved the mapping from the page).
> >>     Actually, it does make a difference, in that tail pages no longer fail
> >>     at this check, so it's now possible to remove a tail page from a mapping.
> >>
> >>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >>     Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> >>     Reviewed-by: Christoph Hellwig <hch@lst.de>
> >>
> >> Am I "not" supposed to use this commit id as it's not "stable" now?
> > 
> > No, it's not stable yet. In whatever way you get the above commit (I guess
> > you get it from https://github.com/hnaz/linux-mm), all acked mm-related
> > patches are sent to Linus by Andrew *by email*, so the eventual commit IDs
> > should be determined when they are applied to mainline.
> > 
> 
> Many thanks for your explanation. (I get this commit id from linux-next tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git)
> So I should remember always to get the commit id from mainline.

It's likely that this commit ID will be the same once Matthew's patch
goes into mainline.

But this is why we include the patch title ("mm/truncate: Inline ...")
when identifying commits.  Sometimes stuff happens...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages
  2022-03-14 23:45           ` Andrew Morton
@ 2022-03-15 13:55             ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-15 13:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HORIGUCHI NAOYA (堀口 直也),
	tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel,
	linux-edac

On 2022/3/15 7:45, Andrew Morton wrote:
> On Mon, 14 Mar 2022 10:59:40 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote:
>>>> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote:
>>>>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page()
>>>>>
>>>>> This commit ID does not exist in mainline (or in the latest mmotm?),
>>>>> so you can't use it in patch description.  Could you update this part?
>>>>>
>>>>
>>>> This commit is in the mmotm but not in mainline yet:
>>>>
>>>> commit 042c4f32323beb28146c658202d3e69899e4f245
>>>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> Date:   Sat Feb 12 15:27:42 2022 -0500
>>>>
>>>>     mm/truncate: Inline invalidate_complete_page() into its one caller
>>>>
>>>>     invalidate_inode_page() is the only caller of invalidate_complete_page()
>>>>     and inlining it reveals that the first check is unnecessary (because we
>>>>     hold the page locked, and we just retrieved the mapping from the page).
>>>>     Actually, it does make a difference, in that tail pages no longer fail
>>>>     at this check, so it's now possible to remove a tail page from a mapping.
>>>>
>>>>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>     Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>>>>     Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>
>>>> Am I "not" supposed to use this commit id as it's not "stable" now?
>>>
>>> No, it's not stable yet. In whatever way you get the above commit (I guess
>>> you get it from https://github.com/hnaz/linux-mm), all acked mm-related
>>> patches are sent to Linus by Andrew *by email*, so the eventual commit IDs
>>> should be determined when they are applied to mainline.
>>>
>>
>> Many thanks for your explanation. (I get this commit id from linux-next tree:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git)
>> So I should remember always to get the commit id from mainline.
> 
> It's likely that this commit ID will be the same once Matthew's patch
> goes into mainline.
> 
> But this is why we include the patch title ("mm/truncate: Inline ...")
> when identifying commits.  Sometimes stuff happens...

I remember I used the stale commit id once in my past patch. I made
this mistake again. Sorry about it. :(

> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-14 18:20   ` Mike Kravetz
@ 2022-03-15 14:19     ` Miaohe Lin
  2022-03-15 18:19       ` Yang Shi
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-03-15 14:19 UTC (permalink / raw)
  To: Mike Kravetz, naoya.horiguchi
  Cc: shy828301, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp

On 2022/3/15 2:20, Mike Kravetz wrote:
> On 3/11/22 23:46, Miaohe Lin wrote:
>> There is a race window where we got the compound_head, the hugetlb page
>> could be freed to buddy, or even changed to another compound page just
>> before we try to get hwpoison page. Think about the below race window:
>>   CPU 1					  CPU 2
>>   memory_failure_hugetlb
>>   struct page *head = compound_head(p);
>> 					  hugetlb page might be freed to
>> 					  buddy, or even changed to another
>> 					  compound page.
>>
>>   get_hwpoison_page -- page is not what we want now...
>>
>> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
>> introduced to record this event.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  include/linux/mm.h      |  1 +
>>  include/ras/ras_event.h |  1 +
>>  mm/memory-failure.c     | 12 ++++++++++++
>>  3 files changed, 14 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index c9bada4096ac..ef98cff2b253 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
>>  	MF_MSG_BUDDY,
>>  	MF_MSG_DAX,
>>  	MF_MSG_UNSPLIT_THP,
>> +	MF_MSG_DIFFERENT_PAGE_SIZE,
>>  	MF_MSG_UNKNOWN,
>>  };
>>  
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index d0337a41141c..1e694fd239b9 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
>>  	EM ( MF_MSG_BUDDY, "free buddy page" )				\
>>  	EM ( MF_MSG_DAX, "dax page" )					\
>>  	EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )			\
>> +	EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )	\
>>  	EMe ( MF_MSG_UNKNOWN, "unknown page" )
>>  
>>  /*
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 5444a8ef4867..dabecd87ad3f 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
>>  	[MF_MSG_BUDDY]			= "free buddy page",
>>  	[MF_MSG_DAX]			= "dax page",
>>  	[MF_MSG_UNSPLIT_THP]		= "unsplit thp",
>> +	[MF_MSG_DIFFERENT_PAGE_SIZE]	= "different page size",
>>  	[MF_MSG_UNKNOWN]		= "unknown page",
>>  };
>>  
>> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>  	}
>>  
>>  	lock_page(head);
>> +
>> +	/**
>> +	 * The page could have changed compound pages due to race window.
>> +	 * If this happens just bail out.
>> +	 */
>> +	if (!PageHuge(p) || compound_head(p) != head) {
>> +		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
>> +		res = -EBUSY;
> 
> We have discussed this race in other versions of the patch.  When we encounter
> the race, we have likely marked poison on the wrong page.  Correct?
> 

Many thanks for comment.
I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock
in memory_failure_hugetlb()" would set the PageHWPoison after the above check.
So I think the below operation is not needed as PageHWPoison is not set yet.
Does this makes sense for you?

Thanks.

> Instead of printing a "different page size", would it be better to perhaps:
> - Print a message that wrong page may be marked for poison?
> - Clear the poison flag in the "head page" previously set?
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-15 14:19     ` Miaohe Lin
@ 2022-03-15 18:19       ` Yang Shi
  2022-03-16  8:18         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Yang Shi @ 2022-03-15 18:19 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Mike Kravetz, naoya.horiguchi, linux-mm, linux-kernel,
	linux-edac, akpm, tony.luck, bp

On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/15 2:20, Mike Kravetz wrote:
> > On 3/11/22 23:46, Miaohe Lin wrote:
> >> There is a race window where we got the compound_head, the hugetlb page
> >> could be freed to buddy, or even changed to another compound page just
> >> before we try to get hwpoison page. Think about the below race window:
> >>   CPU 1                                        CPU 2
> >>   memory_failure_hugetlb
> >>   struct page *head = compound_head(p);
> >>                                        hugetlb page might be freed to
> >>                                        buddy, or even changed to another
> >>                                        compound page.
> >>
> >>   get_hwpoison_page -- page is not what we want now...
> >>
> >> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
> >> introduced to record this event.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  include/linux/mm.h      |  1 +
> >>  include/ras/ras_event.h |  1 +
> >>  mm/memory-failure.c     | 12 ++++++++++++
> >>  3 files changed, 14 insertions(+)
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index c9bada4096ac..ef98cff2b253 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
> >>      MF_MSG_BUDDY,
> >>      MF_MSG_DAX,
> >>      MF_MSG_UNSPLIT_THP,
> >> +    MF_MSG_DIFFERENT_PAGE_SIZE,
> >>      MF_MSG_UNKNOWN,
> >>  };
> >>
> >> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> >> index d0337a41141c..1e694fd239b9 100644
> >> --- a/include/ras/ras_event.h
> >> +++ b/include/ras/ras_event.h
> >> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
> >>      EM ( MF_MSG_BUDDY, "free buddy page" )                          \
> >>      EM ( MF_MSG_DAX, "dax page" )                                   \
> >>      EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
> >> +    EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )        \
> >>      EMe ( MF_MSG_UNKNOWN, "unknown page" )
> >>
> >>  /*
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 5444a8ef4867..dabecd87ad3f 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
> >>      [MF_MSG_BUDDY]                  = "free buddy page",
> >>      [MF_MSG_DAX]                    = "dax page",
> >>      [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
> >> +    [MF_MSG_DIFFERENT_PAGE_SIZE]    = "different page size",
> >>      [MF_MSG_UNKNOWN]                = "unknown page",
> >>  };
> >>
> >> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> >>      }
> >>
> >>      lock_page(head);
> >> +
> >> +    /**
> >> +     * The page could have changed compound pages due to race window.
> >> +     * If this happens just bail out.
> >> +     */
> >> +    if (!PageHuge(p) || compound_head(p) != head) {
> >> +            action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> >> +            res = -EBUSY;
> >
> > We have discussed this race in other versions of the patch.  When we encounter
> > the race, we have likely marked poison on the wrong page.  Correct?
> >
>
> Many thanks for comment.
> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock
> in memory_failure_hugetlb()" would set the PageHWPoison after the above check.
> So I think the below operation is not needed as PageHWPoison is not set yet.
> Does this makes sense for you?

I'm wondering if it might be better and helpful for review to squash
this patch with Naoya's patch together? It seems we always missed the
other part when reviewing the patches.

>
> Thanks.
>
> > Instead of printing a "different page size", would it be better to perhaps:
> > - Print a message that wrong page may be marked for poison?
> > - Clear the poison flag in the "head page" previously set?
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-15 18:19       ` Yang Shi
@ 2022-03-16  8:18         ` Miaohe Lin
  2022-03-16  8:30           ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-03-16  8:18 UTC (permalink / raw)
  To: Yang Shi, HORIGUCHI NAOYA
  Cc: Mike Kravetz, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp

On 2022/3/16 2:19, Yang Shi wrote:
> On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/3/15 2:20, Mike Kravetz wrote:
>>> On 3/11/22 23:46, Miaohe Lin wrote:
>>>> There is a race window where we got the compound_head, the hugetlb page
>>>> could be freed to buddy, or even changed to another compound page just
>>>> before we try to get hwpoison page. Think about the below race window:
>>>>   CPU 1                                        CPU 2
>>>>   memory_failure_hugetlb
>>>>   struct page *head = compound_head(p);
>>>>                                        hugetlb page might be freed to
>>>>                                        buddy, or even changed to another
>>>>                                        compound page.
>>>>
>>>>   get_hwpoison_page -- page is not what we want now...
>>>>
>>>> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is
>>>> introduced to record this event.
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  include/linux/mm.h      |  1 +
>>>>  include/ras/ras_event.h |  1 +
>>>>  mm/memory-failure.c     | 12 ++++++++++++
>>>>  3 files changed, 14 insertions(+)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index c9bada4096ac..ef98cff2b253 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -3253,6 +3253,7 @@ enum mf_action_page_type {
>>>>      MF_MSG_BUDDY,
>>>>      MF_MSG_DAX,
>>>>      MF_MSG_UNSPLIT_THP,
>>>> +    MF_MSG_DIFFERENT_PAGE_SIZE,
>>>>      MF_MSG_UNKNOWN,
>>>>  };
>>>>
>>>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>>>> index d0337a41141c..1e694fd239b9 100644
>>>> --- a/include/ras/ras_event.h
>>>> +++ b/include/ras/ras_event.h
>>>> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event,
>>>>      EM ( MF_MSG_BUDDY, "free buddy page" )                          \
>>>>      EM ( MF_MSG_DAX, "dax page" )                                   \
>>>>      EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
>>>> +    EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" )        \
>>>>      EMe ( MF_MSG_UNKNOWN, "unknown page" )
>>>>
>>>>  /*
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 5444a8ef4867..dabecd87ad3f 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = {
>>>>      [MF_MSG_BUDDY]                  = "free buddy page",
>>>>      [MF_MSG_DAX]                    = "dax page",
>>>>      [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
>>>> +    [MF_MSG_DIFFERENT_PAGE_SIZE]    = "different page size",
>>>>      [MF_MSG_UNKNOWN]                = "unknown page",
>>>>  };
>>>>
>>>> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>>>>      }
>>>>
>>>>      lock_page(head);
>>>> +
>>>> +    /**
>>>> +     * The page could have changed compound pages due to race window.
>>>> +     * If this happens just bail out.
>>>> +     */
>>>> +    if (!PageHuge(p) || compound_head(p) != head) {
>>>> +            action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
>>>> +            res = -EBUSY;
>>>
>>> We have discussed this race in other versions of the patch.  When we encounter
>>> the race, we have likely marked poison on the wrong page.  Correct?
>>>
>>
>> Many thanks for comment.
>> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock
>> in memory_failure_hugetlb()" would set the PageHWPoison after the above check.
>> So I think the below operation is not needed as PageHWPoison is not set yet.
>> Does this makes sense for you?
> 
> I'm wondering if it might be better and helpful for review to squash
> this patch with Naoya's patch together? It seems we always missed the
> other part when reviewing the patches.
> 

Sounds like a good idea. This would make the reviewer's life easier. I'm fine if
this patch is squashed into Naoya's patch altogether. But we might have to consult
the opinion from Naoya.

Thanks.

>>
>> Thanks.
>>
>>> Instead of printing a "different page size", would it be better to perhaps:
>>> - Print a message that wrong page may be marked for poison?
>>> - Clear the poison flag in the "head page" previously set?
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-16  8:18         ` Miaohe Lin
@ 2022-03-16  8:30           ` HORIGUCHI NAOYA(堀口 直也)
  2022-03-16  8:41             ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-16  8:30 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Yang Shi, Mike Kravetz, linux-mm, linux-kernel, linux-edac, akpm,
	tony.luck, bp

On Wed, Mar 16, 2022 at 04:18:30PM +0800, Miaohe Lin wrote:
> On 2022/3/16 2:19, Yang Shi wrote:
> > On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
...
> >>
> >>
> >> Many thanks for comment.
> >> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock
> >> in memory_failure_hugetlb()" would set the PageHWPoison after the above check.
> >> So I think the below operation is not needed as PageHWPoison is not set yet.
> >> Does this makes sense for you?
> > 
> > I'm wondering if it might be better and helpful for review to squash
> > this patch with Naoya's patch together? It seems we always missed the
> > other part when reviewing the patches.
> > 
> 
> Sounds like a good idea. This would make the reviewer's life easier. I'm fine if
> this patch is squashed into Naoya's patch altogether. But we might have to consult
> the opinion from Naoya.

I'm fine with the squashing, so I'll send v4.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again
  2022-03-16  8:30           ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-03-16  8:41             ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-03-16  8:41 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), akpm
  Cc: Yang Shi, Mike Kravetz, linux-mm, linux-kernel, linux-edac,
	tony.luck, bp

On 2022/3/16 16:30, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Mar 16, 2022 at 04:18:30PM +0800, Miaohe Lin wrote:
>> On 2022/3/16 2:19, Yang Shi wrote:
>>> On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> ...
>>>>
>>>>
>>>> Many thanks for comment.
>>>> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock
>>>> in memory_failure_hugetlb()" would set the PageHWPoison after the above check.
>>>> So I think the below operation is not needed as PageHWPoison is not set yet.
>>>> Does this makes sense for you?
>>>
>>> I'm wondering if it might be better and helpful for review to squash
>>> this patch with Naoya's patch together? It seems we always missed the
>>> other part when reviewing the patches.
>>>
>>
>> Sounds like a good idea. This would make the reviewer's life easier. I'm fine if
>> this patch is squashed into Naoya's patch altogether. But we might have to consult
>> the opinion from Naoya.
> 
> I'm fine with the squashing, so I'll send v4.

Many thanks for doing this.

So I'll send v3 later to fix the "stale commit id" in the commit log of the [PATCH v2 2/3]
mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages.

Thanks.

> 
> Thanks,
> Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-03-16  8:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-12  7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin
2022-03-12  7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin
2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-14  1:51     ` Miaohe Lin
2022-03-14 18:20   ` Mike Kravetz
2022-03-15 14:19     ` Miaohe Lin
2022-03-15 18:19       ` Yang Shi
2022-03-16  8:18         ` Miaohe Lin
2022-03-16  8:30           ` HORIGUCHI NAOYA(堀口 直也)
2022-03-16  8:41             ` Miaohe Lin
2022-03-12  7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin
2022-03-13 23:41   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-14  1:58     ` Miaohe Lin
2022-03-14  2:50       ` HORIGUCHI NAOYA(堀口 直也)
2022-03-14  2:59         ` Miaohe Lin
2022-03-14 23:45           ` Andrew Morton
2022-03-15 13:55             ` Miaohe Lin
2022-03-12  7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin
2022-03-13 23:43   ` HORIGUCHI NAOYA(堀口 直也)
2022-03-14 17:34   ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.