* [PATCH v2 0/3] A few fixup patches for memory failure @ 2022-03-12 7:46 Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-12 7:46 UTC (permalink / raw) To: akpm, tony.luck, bp, naoya.horiguchi Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe Hi everyone, This series contains a few patches to fix the race with changing page compound page, make non-LRU movable pages unhandlable and so on. More details can be found in the respective changelogs. Thanks! --- v1->v2: drop "mm/memory-failure.c: fix wrong user reference report" make non-LRU movable pages unhandlable fix confusing commit log and introduce MF_MSG_DIFFERENT_PAGE_SIZE Many thanks Naoya, Mike and Yang Shi for review! --- Miaohe Lin (3): mm/memory-failure.c: fix race with changing page compound again mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages mm/memory-failure.c: make non-LRU movable pages unhandlable include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/memory-failure.c | 34 ++++++++++++++++++++++++++-------- 3 files changed, 28 insertions(+), 8 deletions(-) -- 2.23.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-12 7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin @ 2022-03-12 7:46 ` Miaohe Lin 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 18:20 ` Mike Kravetz 2022-03-12 7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin 2 siblings, 2 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-12 7:46 UTC (permalink / raw) To: akpm, tony.luck, bp, naoya.horiguchi Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe There is a race window where we got the compound_head, the hugetlb page could be freed to buddy, or even changed to another compound page just before we try to get hwpoison page. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is introduced to record this event. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/memory-failure.c | 12 ++++++++++++ 3 files changed, 14 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index c9bada4096ac..ef98cff2b253 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3253,6 +3253,7 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_DIFFERENT_PAGE_SIZE, MF_MSG_UNKNOWN, }; diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index d0337a41141c..1e694fd239b9 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_BUDDY, "free buddy page" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5444a8ef4867..dabecd87ad3f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) } lock_page(head); + + /** + * The page could have changed compound pages due to race window. + * If this happens just bail out. + */ + if (!PageHuge(p) || compound_head(p) != head) { + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); + res = -EBUSY; + goto out; + } + page_flags = head->flags; if (hwpoison_filter(p)) { -- 2.23.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin @ 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 1:51 ` Miaohe Lin 2022-03-14 18:20 ` Mike Kravetz 1 sibling, 1 reply; 20+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:41 UTC (permalink / raw) To: Miaohe Lin Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On Sat, Mar 12, 2022 at 03:46:11PM +0800, Miaohe Lin wrote: > There is a race window where we got the compound_head, the hugetlb page > could be freed to buddy, or even changed to another compound page just > before we try to get hwpoison page. Think about the below race window: > CPU 1 CPU 2 > memory_failure_hugetlb > struct page *head = compound_head(p); > hugetlb page might be freed to > buddy, or even changed to another > compound page. > > get_hwpoison_page -- page is not what we want now... > > If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is > introduced to record this event. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > include/linux/mm.h | 1 + > include/ras/ras_event.h | 1 + > mm/memory-failure.c | 12 ++++++++++++ > 3 files changed, 14 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c9bada4096ac..ef98cff2b253 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3253,6 +3253,7 @@ enum mf_action_page_type { > MF_MSG_BUDDY, > MF_MSG_DAX, > MF_MSG_UNSPLIT_THP, > + MF_MSG_DIFFERENT_PAGE_SIZE, > MF_MSG_UNKNOWN, > }; > > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h > index d0337a41141c..1e694fd239b9 100644 > --- a/include/ras/ras_event.h > +++ b/include/ras/ras_event.h > @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, > EM ( MF_MSG_BUDDY, "free buddy page" ) \ > EM ( MF_MSG_DAX, "dax page" ) \ > EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ > + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ > EMe ( MF_MSG_UNKNOWN, "unknown page" ) > > /* > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5444a8ef4867..dabecd87ad3f 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { > [MF_MSG_BUDDY] = "free buddy page", > [MF_MSG_DAX] = "dax page", > [MF_MSG_UNSPLIT_THP] = "unsplit thp", > + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", > [MF_MSG_UNKNOWN] = "unknown page", > }; > > @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > } > > lock_page(head); > + > + /** This comment section starting with '/**' is considered as kernel-doc comment, maybe that's not expected because it just describes an implementation detail. So normal comment section with '/*' would be better. Otherwise, looks good to me. Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > + * The page could have changed compound pages due to race window. > + * If this happens just bail out. > + */ > + if (!PageHuge(p) || compound_head(p) != head) { > + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); > + res = -EBUSY; > + goto out; > + } > + > page_flags = head->flags; > > if (hwpoison_filter(p)) { > -- > 2.23.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 1:51 ` Miaohe Lin 0 siblings, 0 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-14 1:51 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也) Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: > On Sat, Mar 12, 2022 at 03:46:11PM +0800, Miaohe Lin wrote: >> There is a race window where we got the compound_head, the hugetlb page >> could be freed to buddy, or even changed to another compound page just >> before we try to get hwpoison page. Think about the below race window: >> CPU 1 CPU 2 >> memory_failure_hugetlb >> struct page *head = compound_head(p); >> hugetlb page might be freed to >> buddy, or even changed to another >> compound page. >> >> get_hwpoison_page -- page is not what we want now... >> >> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is >> introduced to record this event. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> include/linux/mm.h | 1 + >> include/ras/ras_event.h | 1 + >> mm/memory-failure.c | 12 ++++++++++++ >> 3 files changed, 14 insertions(+) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index c9bada4096ac..ef98cff2b253 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -3253,6 +3253,7 @@ enum mf_action_page_type { >> MF_MSG_BUDDY, >> MF_MSG_DAX, >> MF_MSG_UNSPLIT_THP, >> + MF_MSG_DIFFERENT_PAGE_SIZE, >> MF_MSG_UNKNOWN, >> }; >> >> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h >> index d0337a41141c..1e694fd239b9 100644 >> --- a/include/ras/ras_event.h >> +++ b/include/ras/ras_event.h >> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, >> EM ( MF_MSG_BUDDY, "free buddy page" ) \ >> EM ( MF_MSG_DAX, "dax page" ) \ >> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ >> + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ >> EMe ( MF_MSG_UNKNOWN, "unknown page" ) >> >> /* >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 5444a8ef4867..dabecd87ad3f 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { >> [MF_MSG_BUDDY] = "free buddy page", >> [MF_MSG_DAX] = "dax page", >> [MF_MSG_UNSPLIT_THP] = "unsplit thp", >> + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", >> [MF_MSG_UNKNOWN] = "unknown page", >> }; >> >> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) >> } >> >> lock_page(head); >> + >> + /** > > This comment section starting with '/**' is considered as kernel-doc comment, > maybe that's not expected because it just describes an implementation detail. > So normal comment section with '/*' would be better. > I see. Will change to use "/*". > Otherwise, looks good to me. > > Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Many thanks for review and comment. > >> + * The page could have changed compound pages due to race window. >> + * If this happens just bail out. >> + */ >> + if (!PageHuge(p) || compound_head(p) != head) { >> + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); >> + res = -EBUSY; >> + goto out; >> + } >> + >> page_flags = head->flags; >> >> if (hwpoison_filter(p)) { >> -- >> 2.23.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 18:20 ` Mike Kravetz 2022-03-15 14:19 ` Miaohe Lin 1 sibling, 1 reply; 20+ messages in thread From: Mike Kravetz @ 2022-03-14 18:20 UTC (permalink / raw) To: Miaohe Lin, akpm, tony.luck, bp, naoya.horiguchi Cc: shy828301, linux-mm, linux-kernel, linux-edac On 3/11/22 23:46, Miaohe Lin wrote: > There is a race window where we got the compound_head, the hugetlb page > could be freed to buddy, or even changed to another compound page just > before we try to get hwpoison page. Think about the below race window: > CPU 1 CPU 2 > memory_failure_hugetlb > struct page *head = compound_head(p); > hugetlb page might be freed to > buddy, or even changed to another > compound page. > > get_hwpoison_page -- page is not what we want now... > > If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is > introduced to record this event. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > include/linux/mm.h | 1 + > include/ras/ras_event.h | 1 + > mm/memory-failure.c | 12 ++++++++++++ > 3 files changed, 14 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c9bada4096ac..ef98cff2b253 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3253,6 +3253,7 @@ enum mf_action_page_type { > MF_MSG_BUDDY, > MF_MSG_DAX, > MF_MSG_UNSPLIT_THP, > + MF_MSG_DIFFERENT_PAGE_SIZE, > MF_MSG_UNKNOWN, > }; > > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h > index d0337a41141c..1e694fd239b9 100644 > --- a/include/ras/ras_event.h > +++ b/include/ras/ras_event.h > @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, > EM ( MF_MSG_BUDDY, "free buddy page" ) \ > EM ( MF_MSG_DAX, "dax page" ) \ > EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ > + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ > EMe ( MF_MSG_UNKNOWN, "unknown page" ) > > /* > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5444a8ef4867..dabecd87ad3f 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { > [MF_MSG_BUDDY] = "free buddy page", > [MF_MSG_DAX] = "dax page", > [MF_MSG_UNSPLIT_THP] = "unsplit thp", > + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", > [MF_MSG_UNKNOWN] = "unknown page", > }; > > @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > } > > lock_page(head); > + > + /** > + * The page could have changed compound pages due to race window. > + * If this happens just bail out. > + */ > + if (!PageHuge(p) || compound_head(p) != head) { > + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); > + res = -EBUSY; We have discussed this race in other versions of the patch. When we encounter the race, we have likely marked poison on the wrong page. Correct? Instead of printing a "different page size", would it be better to perhaps: - Print a message that wrong page may be marked for poison? - Clear the poison flag in the "head page" previously set? -- Mike Kravetz > + goto out; > + } > + > page_flags = head->flags; > > if (hwpoison_filter(p)) { ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-14 18:20 ` Mike Kravetz @ 2022-03-15 14:19 ` Miaohe Lin 2022-03-15 18:19 ` Yang Shi 0 siblings, 1 reply; 20+ messages in thread From: Miaohe Lin @ 2022-03-15 14:19 UTC (permalink / raw) To: Mike Kravetz, naoya.horiguchi Cc: shy828301, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp On 2022/3/15 2:20, Mike Kravetz wrote: > On 3/11/22 23:46, Miaohe Lin wrote: >> There is a race window where we got the compound_head, the hugetlb page >> could be freed to buddy, or even changed to another compound page just >> before we try to get hwpoison page. Think about the below race window: >> CPU 1 CPU 2 >> memory_failure_hugetlb >> struct page *head = compound_head(p); >> hugetlb page might be freed to >> buddy, or even changed to another >> compound page. >> >> get_hwpoison_page -- page is not what we want now... >> >> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is >> introduced to record this event. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> include/linux/mm.h | 1 + >> include/ras/ras_event.h | 1 + >> mm/memory-failure.c | 12 ++++++++++++ >> 3 files changed, 14 insertions(+) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index c9bada4096ac..ef98cff2b253 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -3253,6 +3253,7 @@ enum mf_action_page_type { >> MF_MSG_BUDDY, >> MF_MSG_DAX, >> MF_MSG_UNSPLIT_THP, >> + MF_MSG_DIFFERENT_PAGE_SIZE, >> MF_MSG_UNKNOWN, >> }; >> >> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h >> index d0337a41141c..1e694fd239b9 100644 >> --- a/include/ras/ras_event.h >> +++ b/include/ras/ras_event.h >> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, >> EM ( MF_MSG_BUDDY, "free buddy page" ) \ >> EM ( MF_MSG_DAX, "dax page" ) \ >> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ >> + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ >> EMe ( MF_MSG_UNKNOWN, "unknown page" ) >> >> /* >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 5444a8ef4867..dabecd87ad3f 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { >> [MF_MSG_BUDDY] = "free buddy page", >> [MF_MSG_DAX] = "dax page", >> [MF_MSG_UNSPLIT_THP] = "unsplit thp", >> + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", >> [MF_MSG_UNKNOWN] = "unknown page", >> }; >> >> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) >> } >> >> lock_page(head); >> + >> + /** >> + * The page could have changed compound pages due to race window. >> + * If this happens just bail out. >> + */ >> + if (!PageHuge(p) || compound_head(p) != head) { >> + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); >> + res = -EBUSY; > > We have discussed this race in other versions of the patch. When we encounter > the race, we have likely marked poison on the wrong page. Correct? > Many thanks for comment. I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock in memory_failure_hugetlb()" would set the PageHWPoison after the above check. So I think the below operation is not needed as PageHWPoison is not set yet. Does this makes sense for you? Thanks. > Instead of printing a "different page size", would it be better to perhaps: > - Print a message that wrong page may be marked for poison? > - Clear the poison flag in the "head page" previously set? > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-15 14:19 ` Miaohe Lin @ 2022-03-15 18:19 ` Yang Shi 2022-03-16 8:18 ` Miaohe Lin 0 siblings, 1 reply; 20+ messages in thread From: Yang Shi @ 2022-03-15 18:19 UTC (permalink / raw) To: Miaohe Lin Cc: Mike Kravetz, naoya.horiguchi, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote: > > On 2022/3/15 2:20, Mike Kravetz wrote: > > On 3/11/22 23:46, Miaohe Lin wrote: > >> There is a race window where we got the compound_head, the hugetlb page > >> could be freed to buddy, or even changed to another compound page just > >> before we try to get hwpoison page. Think about the below race window: > >> CPU 1 CPU 2 > >> memory_failure_hugetlb > >> struct page *head = compound_head(p); > >> hugetlb page might be freed to > >> buddy, or even changed to another > >> compound page. > >> > >> get_hwpoison_page -- page is not what we want now... > >> > >> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is > >> introduced to record this event. > >> > >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > >> --- > >> include/linux/mm.h | 1 + > >> include/ras/ras_event.h | 1 + > >> mm/memory-failure.c | 12 ++++++++++++ > >> 3 files changed, 14 insertions(+) > >> > >> diff --git a/include/linux/mm.h b/include/linux/mm.h > >> index c9bada4096ac..ef98cff2b253 100644 > >> --- a/include/linux/mm.h > >> +++ b/include/linux/mm.h > >> @@ -3253,6 +3253,7 @@ enum mf_action_page_type { > >> MF_MSG_BUDDY, > >> MF_MSG_DAX, > >> MF_MSG_UNSPLIT_THP, > >> + MF_MSG_DIFFERENT_PAGE_SIZE, > >> MF_MSG_UNKNOWN, > >> }; > >> > >> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h > >> index d0337a41141c..1e694fd239b9 100644 > >> --- a/include/ras/ras_event.h > >> +++ b/include/ras/ras_event.h > >> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, > >> EM ( MF_MSG_BUDDY, "free buddy page" ) \ > >> EM ( MF_MSG_DAX, "dax page" ) \ > >> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ > >> + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ > >> EMe ( MF_MSG_UNKNOWN, "unknown page" ) > >> > >> /* > >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >> index 5444a8ef4867..dabecd87ad3f 100644 > >> --- a/mm/memory-failure.c > >> +++ b/mm/memory-failure.c > >> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { > >> [MF_MSG_BUDDY] = "free buddy page", > >> [MF_MSG_DAX] = "dax page", > >> [MF_MSG_UNSPLIT_THP] = "unsplit thp", > >> + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", > >> [MF_MSG_UNKNOWN] = "unknown page", > >> }; > >> > >> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > >> } > >> > >> lock_page(head); > >> + > >> + /** > >> + * The page could have changed compound pages due to race window. > >> + * If this happens just bail out. > >> + */ > >> + if (!PageHuge(p) || compound_head(p) != head) { > >> + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); > >> + res = -EBUSY; > > > > We have discussed this race in other versions of the patch. When we encounter > > the race, we have likely marked poison on the wrong page. Correct? > > > > Many thanks for comment. > I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock > in memory_failure_hugetlb()" would set the PageHWPoison after the above check. > So I think the below operation is not needed as PageHWPoison is not set yet. > Does this makes sense for you? I'm wondering if it might be better and helpful for review to squash this patch with Naoya's patch together? It seems we always missed the other part when reviewing the patches. > > Thanks. > > > Instead of printing a "different page size", would it be better to perhaps: > > - Print a message that wrong page may be marked for poison? > > - Clear the poison flag in the "head page" previously set? > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-15 18:19 ` Yang Shi @ 2022-03-16 8:18 ` Miaohe Lin 2022-03-16 8:30 ` HORIGUCHI NAOYA(堀口 直也) 0 siblings, 1 reply; 20+ messages in thread From: Miaohe Lin @ 2022-03-16 8:18 UTC (permalink / raw) To: Yang Shi, HORIGUCHI NAOYA Cc: Mike Kravetz, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp On 2022/3/16 2:19, Yang Shi wrote: > On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote: >> >> On 2022/3/15 2:20, Mike Kravetz wrote: >>> On 3/11/22 23:46, Miaohe Lin wrote: >>>> There is a race window where we got the compound_head, the hugetlb page >>>> could be freed to buddy, or even changed to another compound page just >>>> before we try to get hwpoison page. Think about the below race window: >>>> CPU 1 CPU 2 >>>> memory_failure_hugetlb >>>> struct page *head = compound_head(p); >>>> hugetlb page might be freed to >>>> buddy, or even changed to another >>>> compound page. >>>> >>>> get_hwpoison_page -- page is not what we want now... >>>> >>>> If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is >>>> introduced to record this event. >>>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> --- >>>> include/linux/mm.h | 1 + >>>> include/ras/ras_event.h | 1 + >>>> mm/memory-failure.c | 12 ++++++++++++ >>>> 3 files changed, 14 insertions(+) >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index c9bada4096ac..ef98cff2b253 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -3253,6 +3253,7 @@ enum mf_action_page_type { >>>> MF_MSG_BUDDY, >>>> MF_MSG_DAX, >>>> MF_MSG_UNSPLIT_THP, >>>> + MF_MSG_DIFFERENT_PAGE_SIZE, >>>> MF_MSG_UNKNOWN, >>>> }; >>>> >>>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h >>>> index d0337a41141c..1e694fd239b9 100644 >>>> --- a/include/ras/ras_event.h >>>> +++ b/include/ras/ras_event.h >>>> @@ -374,6 +374,7 @@ TRACE_EVENT(aer_event, >>>> EM ( MF_MSG_BUDDY, "free buddy page" ) \ >>>> EM ( MF_MSG_DAX, "dax page" ) \ >>>> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ >>>> + EM ( MF_MSG_DIFFERENT_PAGE_SIZE, "different page size" ) \ >>>> EMe ( MF_MSG_UNKNOWN, "unknown page" ) >>>> >>>> /* >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>> index 5444a8ef4867..dabecd87ad3f 100644 >>>> --- a/mm/memory-failure.c >>>> +++ b/mm/memory-failure.c >>>> @@ -733,6 +733,7 @@ static const char * const action_page_types[] = { >>>> [MF_MSG_BUDDY] = "free buddy page", >>>> [MF_MSG_DAX] = "dax page", >>>> [MF_MSG_UNSPLIT_THP] = "unsplit thp", >>>> + [MF_MSG_DIFFERENT_PAGE_SIZE] = "different page size", >>>> [MF_MSG_UNKNOWN] = "unknown page", >>>> }; >>>> >>>> @@ -1534,6 +1535,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) >>>> } >>>> >>>> lock_page(head); >>>> + >>>> + /** >>>> + * The page could have changed compound pages due to race window. >>>> + * If this happens just bail out. >>>> + */ >>>> + if (!PageHuge(p) || compound_head(p) != head) { >>>> + action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); >>>> + res = -EBUSY; >>> >>> We have discussed this race in other versions of the patch. When we encounter >>> the race, we have likely marked poison on the wrong page. Correct? >>> >> >> Many thanks for comment. >> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock >> in memory_failure_hugetlb()" would set the PageHWPoison after the above check. >> So I think the below operation is not needed as PageHWPoison is not set yet. >> Does this makes sense for you? > > I'm wondering if it might be better and helpful for review to squash > this patch with Naoya's patch together? It seems we always missed the > other part when reviewing the patches. > Sounds like a good idea. This would make the reviewer's life easier. I'm fine if this patch is squashed into Naoya's patch altogether. But we might have to consult the opinion from Naoya. Thanks. >> >> Thanks. >> >>> Instead of printing a "different page size", would it be better to perhaps: >>> - Print a message that wrong page may be marked for poison? >>> - Clear the poison flag in the "head page" previously set? >>> >> > . > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-16 8:18 ` Miaohe Lin @ 2022-03-16 8:30 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-16 8:41 ` Miaohe Lin 0 siblings, 1 reply; 20+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-16 8:30 UTC (permalink / raw) To: Miaohe Lin Cc: Yang Shi, Mike Kravetz, linux-mm, linux-kernel, linux-edac, akpm, tony.luck, bp On Wed, Mar 16, 2022 at 04:18:30PM +0800, Miaohe Lin wrote: > On 2022/3/16 2:19, Yang Shi wrote: > > On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote: ... > >> > >> > >> Many thanks for comment. > >> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock > >> in memory_failure_hugetlb()" would set the PageHWPoison after the above check. > >> So I think the below operation is not needed as PageHWPoison is not set yet. > >> Does this makes sense for you? > > > > I'm wondering if it might be better and helpful for review to squash > > this patch with Naoya's patch together? It seems we always missed the > > other part when reviewing the patches. > > > > Sounds like a good idea. This would make the reviewer's life easier. I'm fine if > this patch is squashed into Naoya's patch altogether. But we might have to consult > the opinion from Naoya. I'm fine with the squashing, so I'll send v4. Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again 2022-03-16 8:30 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-16 8:41 ` Miaohe Lin 0 siblings, 0 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-16 8:41 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也), akpm Cc: Yang Shi, Mike Kravetz, linux-mm, linux-kernel, linux-edac, tony.luck, bp On 2022/3/16 16:30, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Mar 16, 2022 at 04:18:30PM +0800, Miaohe Lin wrote: >> On 2022/3/16 2:19, Yang Shi wrote: >>> On Tue, Mar 15, 2022 at 7:19 AM Miaohe Lin <linmiaohe@huawei.com> wrote: > ... >>>> >>>> >>>> Many thanks for comment. >>>> I assume that Naoya's patch "mm/hwpoison: set PageHWPoison after taking page lock >>>> in memory_failure_hugetlb()" would set the PageHWPoison after the above check. >>>> So I think the below operation is not needed as PageHWPoison is not set yet. >>>> Does this makes sense for you? >>> >>> I'm wondering if it might be better and helpful for review to squash >>> this patch with Naoya's patch together? It seems we always missed the >>> other part when reviewing the patches. >>> >> >> Sounds like a good idea. This would make the reviewer's life easier. I'm fine if >> this patch is squashed into Naoya's patch altogether. But we might have to consult >> the opinion from Naoya. > > I'm fine with the squashing, so I'll send v4. Many thanks for doing this. So I'll send v3 later to fix the "stale commit id" in the commit log of the [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages. Thanks. > > Thanks, > Naoya Horiguchi > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-12 7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin @ 2022-03-12 7:46 ` Miaohe Lin 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-12 7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin 2 siblings, 1 reply; 20+ messages in thread From: Miaohe Lin @ 2022-03-12 7:46 UTC (permalink / raw) To: akpm, tony.luck, bp, naoya.horiguchi Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() into its one caller"), invalidate_inode_page() can invalidate the pages in the swap cache because the check of page->mapping != mapping is removed. But invalidate_inode_page() is not expected to deal with the pages in swap cache. Also non-lru movable page can reach here too. They're not page cache pages. Skip these pages by checking PageSwapCache and PageLRU. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index dabecd87ad3f..2ff7dd2078c4 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page) return 0; } - if (!PageHuge(page)) + if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page)) /* * Try to invalidate first. This should work for * non dirty unmapped page cache pages. -- 2.23.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-12 7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin @ 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 1:58 ` Miaohe Lin 0 siblings, 1 reply; 20+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:41 UTC (permalink / raw) To: Miaohe Lin Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: > Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() This commit ID does not exist in mainline (or in the latest mmotm?), so you can't use it in patch description. Could you update this part? Thanks, Naoya Horiguchi > into its one caller"), invalidate_inode_page() can invalidate the pages in > the swap cache because the check of page->mapping != mapping is removed. > But invalidate_inode_page() is not expected to deal with the pages in swap > cache. Also non-lru movable page can reach here too. They're not page cache > pages. Skip these pages by checking PageSwapCache and PageLRU. > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/memory-failure.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index dabecd87ad3f..2ff7dd2078c4 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page) > return 0; > } > > - if (!PageHuge(page)) > + if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page)) > /* > * Try to invalidate first. This should work for > * non dirty unmapped page cache pages. > -- > 2.23.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 1:58 ` Miaohe Lin 2022-03-14 2:50 ` HORIGUCHI NAOYA(堀口 直也) 0 siblings, 1 reply; 20+ messages in thread From: Miaohe Lin @ 2022-03-14 1:58 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也) Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: > On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: >> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() > > This commit ID does not exist in mainline (or in the latest mmotm?), > so you can't use it in patch description. Could you update this part? > This commit is in the mmotm but not in mainline yet: commit 042c4f32323beb28146c658202d3e69899e4f245 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Sat Feb 12 15:27:42 2022 -0500 mm/truncate: Inline invalidate_complete_page() into its one caller invalidate_inode_page() is the only caller of invalidate_complete_page() and inlining it reveals that the first check is unnecessary (because we hold the page locked, and we just retrieved the mapping from the page). Actually, it does make a difference, in that tail pages no longer fail at this check, so it's now possible to remove a tail page from a mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Am I "not" supposed to use this commit id as it's not "stable" now? Will update this part in next version. Many thanks. > Thanks, > Naoya Horiguchi > >> into its one caller"), invalidate_inode_page() can invalidate the pages in >> the swap cache because the check of page->mapping != mapping is removed. >> But invalidate_inode_page() is not expected to deal with the pages in swap >> cache. Also non-lru movable page can reach here too. They're not page cache >> pages. Skip these pages by checking PageSwapCache and PageLRU. >> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/memory-failure.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index dabecd87ad3f..2ff7dd2078c4 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -2190,7 +2190,7 @@ static int __soft_offline_page(struct page *page) >> return 0; >> } >> >> - if (!PageHuge(page)) >> + if (!PageHuge(page) && PageLRU(page) && !PageSwapCache(page)) >> /* >> * Try to invalidate first. This should work for >> * non dirty unmapped page cache pages. >> -- >> 2.23.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-14 1:58 ` Miaohe Lin @ 2022-03-14 2:50 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 2:59 ` Miaohe Lin 0 siblings, 1 reply; 20+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 2:50 UTC (permalink / raw) To: Miaohe Lin Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote: > On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: > >> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() > > > > This commit ID does not exist in mainline (or in the latest mmotm?), > > so you can't use it in patch description. Could you update this part? > > > > This commit is in the mmotm but not in mainline yet: > > commit 042c4f32323beb28146c658202d3e69899e4f245 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Sat Feb 12 15:27:42 2022 -0500 > > mm/truncate: Inline invalidate_complete_page() into its one caller > > invalidate_inode_page() is the only caller of invalidate_complete_page() > and inlining it reveals that the first check is unnecessary (because we > hold the page locked, and we just retrieved the mapping from the page). > Actually, it does make a difference, in that tail pages no longer fail > at this check, so it's now possible to remove a tail page from a mapping. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Reviewed-by: John Hubbard <jhubbard@nvidia.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Am I "not" supposed to use this commit id as it's not "stable" now? No, it's not stable yet. In whatever way you get the above commit (I guess you get it from https://github.com/hnaz/linux-mm), all acked mm-related patches are sent to Linus by Andrew *by email*, so the eventual commit IDs should be determined when they are applied to mainline. Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-14 2:50 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 2:59 ` Miaohe Lin 2022-03-14 23:45 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Miaohe Lin @ 2022-03-14 2:59 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也) Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote: > On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote: >> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: >>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: >>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() >>> >>> This commit ID does not exist in mainline (or in the latest mmotm?), >>> so you can't use it in patch description. Could you update this part? >>> >> >> This commit is in the mmotm but not in mainline yet: >> >> commit 042c4f32323beb28146c658202d3e69899e4f245 >> Author: Matthew Wilcox (Oracle) <willy@infradead.org> >> Date: Sat Feb 12 15:27:42 2022 -0500 >> >> mm/truncate: Inline invalidate_complete_page() into its one caller >> >> invalidate_inode_page() is the only caller of invalidate_complete_page() >> and inlining it reveals that the first check is unnecessary (because we >> hold the page locked, and we just retrieved the mapping from the page). >> Actually, it does make a difference, in that tail pages no longer fail >> at this check, so it's now possible to remove a tail page from a mapping. >> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >> Reviewed-by: John Hubbard <jhubbard@nvidia.com> >> Reviewed-by: Christoph Hellwig <hch@lst.de> >> >> Am I "not" supposed to use this commit id as it's not "stable" now? > > No, it's not stable yet. In whatever way you get the above commit (I guess > you get it from https://github.com/hnaz/linux-mm), all acked mm-related > patches are sent to Linus by Andrew *by email*, so the eventual commit IDs > should be determined when they are applied to mainline. > Many thanks for your explanation. (I get this commit id from linux-next tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git) So I should remember always to get the commit id from mainline. Thanks again. :) > Thanks, > Naoya Horiguchi > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-14 2:59 ` Miaohe Lin @ 2022-03-14 23:45 ` Andrew Morton 2022-03-15 13:55 ` Miaohe Lin 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2022-03-14 23:45 UTC (permalink / raw) To: Miaohe Lin Cc: HORIGUCHI NAOYA, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On Mon, 14 Mar 2022 10:59:40 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote: > >> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: > >>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: > >>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() > >>> > >>> This commit ID does not exist in mainline (or in the latest mmotm?), > >>> so you can't use it in patch description. Could you update this part? > >>> > >> > >> This commit is in the mmotm but not in mainline yet: > >> > >> commit 042c4f32323beb28146c658202d3e69899e4f245 > >> Author: Matthew Wilcox (Oracle) <willy@infradead.org> > >> Date: Sat Feb 12 15:27:42 2022 -0500 > >> > >> mm/truncate: Inline invalidate_complete_page() into its one caller > >> > >> invalidate_inode_page() is the only caller of invalidate_complete_page() > >> and inlining it reveals that the first check is unnecessary (because we > >> hold the page locked, and we just retrieved the mapping from the page). > >> Actually, it does make a difference, in that tail pages no longer fail > >> at this check, so it's now possible to remove a tail page from a mapping. > >> > >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > >> Reviewed-by: John Hubbard <jhubbard@nvidia.com> > >> Reviewed-by: Christoph Hellwig <hch@lst.de> > >> > >> Am I "not" supposed to use this commit id as it's not "stable" now? > > > > No, it's not stable yet. In whatever way you get the above commit (I guess > > you get it from https://github.com/hnaz/linux-mm), all acked mm-related > > patches are sent to Linus by Andrew *by email*, so the eventual commit IDs > > should be determined when they are applied to mainline. > > > > Many thanks for your explanation. (I get this commit id from linux-next tree: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git) > So I should remember always to get the commit id from mainline. It's likely that this commit ID will be the same once Matthew's patch goes into mainline. But this is why we include the patch title ("mm/truncate: Inline ...") when identifying commits. Sometimes stuff happens... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages 2022-03-14 23:45 ` Andrew Morton @ 2022-03-15 13:55 ` Miaohe Lin 0 siblings, 0 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-15 13:55 UTC (permalink / raw) To: Andrew Morton Cc: HORIGUCHI NAOYA (堀口 直也), tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On 2022/3/15 7:45, Andrew Morton wrote: > On Mon, 14 Mar 2022 10:59:40 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote: > >> On 2022/3/14 10:50, HORIGUCHI NAOYA(堀口 直也) wrote: >>> On Mon, Mar 14, 2022 at 09:58:49AM +0800, Miaohe Lin wrote: >>>> On 2022/3/14 7:41, HORIGUCHI NAOYA(堀口 直也) wrote: >>>>> On Sat, Mar 12, 2022 at 03:46:12PM +0800, Miaohe Lin wrote: >>>>>> Since commit 042c4f32323b ("mm/truncate: Inline invalidate_complete_page() >>>>> >>>>> This commit ID does not exist in mainline (or in the latest mmotm?), >>>>> so you can't use it in patch description. Could you update this part? >>>>> >>>> >>>> This commit is in the mmotm but not in mainline yet: >>>> >>>> commit 042c4f32323beb28146c658202d3e69899e4f245 >>>> Author: Matthew Wilcox (Oracle) <willy@infradead.org> >>>> Date: Sat Feb 12 15:27:42 2022 -0500 >>>> >>>> mm/truncate: Inline invalidate_complete_page() into its one caller >>>> >>>> invalidate_inode_page() is the only caller of invalidate_complete_page() >>>> and inlining it reveals that the first check is unnecessary (because we >>>> hold the page locked, and we just retrieved the mapping from the page). >>>> Actually, it does make a difference, in that tail pages no longer fail >>>> at this check, so it's now possible to remove a tail page from a mapping. >>>> >>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>>> Reviewed-by: John Hubbard <jhubbard@nvidia.com> >>>> Reviewed-by: Christoph Hellwig <hch@lst.de> >>>> >>>> Am I "not" supposed to use this commit id as it's not "stable" now? >>> >>> No, it's not stable yet. In whatever way you get the above commit (I guess >>> you get it from https://github.com/hnaz/linux-mm), all acked mm-related >>> patches are sent to Linus by Andrew *by email*, so the eventual commit IDs >>> should be determined when they are applied to mainline. >>> >> >> Many thanks for your explanation. (I get this commit id from linux-next tree: >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git) >> So I should remember always to get the commit id from mainline. > > It's likely that this commit ID will be the same once Matthew's patch > goes into mainline. > > But this is why we include the patch title ("mm/truncate: Inline ...") > when identifying commits. Sometimes stuff happens... I remember I used the stale commit id once in my past patch. I made this mistake again. Sorry about it. :( > . > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable 2022-03-12 7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin @ 2022-03-12 7:46 ` Miaohe Lin 2022-03-13 23:43 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 17:34 ` Yang Shi 2 siblings, 2 replies; 20+ messages in thread From: Miaohe Lin @ 2022-03-12 7:46 UTC (permalink / raw) To: akpm, tony.luck, bp, naoya.horiguchi Cc: mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac, linmiaohe We can not really handle non-LRU movable pages in memory failure. Typically they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU movable page, we could reach as far as identify_page_state(), it should not fall into any category except me_unknown. For the non-LRU compound movable pages, they could be taken for transhuge pages but it's unexpected to split non-LRU movable pages using split_huge_page_to_list in memory_failure. So we could just simply make non-LRU movable pages unhandlable to avoid these possible nasty cases. Suggested-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/memory-failure.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 2ff7dd2078c4..ba621c6823ed 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1177,12 +1177,18 @@ void ClearPageHWPoisonTakenOff(struct page *page) * does not return true for hugetlb or device memory pages, so it's assumed * to be called only in the context where we never have such pages. */ -static inline bool HWPoisonHandlable(struct page *page) +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) { - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); + bool movable = false; + + /* Soft offline could mirgate non-LRU movable pages */ + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page)) + movable = true; + + return movable || PageLRU(page) || is_free_buddy_page(page); } -static int __get_hwpoison_page(struct page *page) +static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct page *head = compound_head(page); int ret = 0; @@ -1197,7 +1203,7 @@ static int __get_hwpoison_page(struct page *page) * for any unsupported type of page in order to reduce the risk of * unexpected races caused by taking a page refcount. */ - if (!HWPoisonHandlable(head)) + if (!HWPoisonHandlable(head, flags)) return -EBUSY; if (get_page_unless_zero(head)) { @@ -1222,7 +1228,7 @@ static int get_any_page(struct page *p, unsigned long flags) try_again: if (!count_increased) { - ret = __get_hwpoison_page(p); + ret = __get_hwpoison_page(p, flags); if (!ret) { if (page_count(p)) { /* We raced with an allocation, retry. */ @@ -1250,7 +1256,7 @@ static int get_any_page(struct page *p, unsigned long flags) } } - if (PageHuge(p) || HWPoisonHandlable(p)) { + if (PageHuge(p) || HWPoisonHandlable(p, flags)) { ret = 1; } else { /* @@ -2308,7 +2314,7 @@ int soft_offline_page(unsigned long pfn, int flags) retry: get_online_mems(); - ret = get_hwpoison_page(page, flags); + ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE); put_online_mems(); if (ret > 0) { -- 2.23.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable 2022-03-12 7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin @ 2022-03-13 23:43 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 17:34 ` Yang Shi 1 sibling, 0 replies; 20+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2022-03-13 23:43 UTC (permalink / raw) To: Miaohe Lin Cc: akpm, tony.luck, bp, mike.kravetz, shy828301, linux-mm, linux-kernel, linux-edac On Sat, Mar 12, 2022 at 03:46:13PM +0800, Miaohe Lin wrote: > We can not really handle non-LRU movable pages in memory failure. Typically > they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU > movable page, we could reach as far as identify_page_state(), it should not > fall into any category except me_unknown. For the non-LRU compound movable > pages, they could be taken for transhuge pages but it's unexpected to split > non-LRU movable pages using split_huge_page_to_list in memory_failure. So > we could just simply make non-LRU movable pages unhandlable to avoid these > possible nasty cases. > > Suggested-by: Yang Shi <shy828301@gmail.com> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Looks good to me. Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable 2022-03-12 7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin 2022-03-13 23:43 ` HORIGUCHI NAOYA(堀口 直也) @ 2022-03-14 17:34 ` Yang Shi 1 sibling, 0 replies; 20+ messages in thread From: Yang Shi @ 2022-03-14 17:34 UTC (permalink / raw) To: Miaohe Lin Cc: akpm, tony.luck, bp, naoya.horiguchi, mike.kravetz, linux-mm, linux-kernel, linux-edac On Fri, Mar 11, 2022 at 11:47 PM Miaohe Lin <linmiaohe@huawei.com> wrote: > > We can not really handle non-LRU movable pages in memory failure. Typically > they are balloon, zsmalloc, etc. Assuming we run into a base (4K) non-LRU > movable page, we could reach as far as identify_page_state(), it should not > fall into any category except me_unknown. For the non-LRU compound movable > pages, they could be taken for transhuge pages but it's unexpected to split > non-LRU movable pages using split_huge_page_to_list in memory_failure. So > we could just simply make non-LRU movable pages unhandlable to avoid these > possible nasty cases. > > Suggested-by: Yang Shi <shy828301@gmail.com> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> > --- > mm/memory-failure.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 2ff7dd2078c4..ba621c6823ed 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1177,12 +1177,18 @@ void ClearPageHWPoisonTakenOff(struct page *page) > * does not return true for hugetlb or device memory pages, so it's assumed > * to be called only in the context where we never have such pages. > */ > -static inline bool HWPoisonHandlable(struct page *page) > +static inline bool HWPoisonHandlable(struct page *page, unsigned long flags) > { > - return PageLRU(page) || __PageMovable(page) || is_free_buddy_page(page); > + bool movable = false; > + > + /* Soft offline could mirgate non-LRU movable pages */ > + if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page)) > + movable = true; > + > + return movable || PageLRU(page) || is_free_buddy_page(page); > } > > -static int __get_hwpoison_page(struct page *page) > +static int __get_hwpoison_page(struct page *page, unsigned long flags) > { > struct page *head = compound_head(page); > int ret = 0; > @@ -1197,7 +1203,7 @@ static int __get_hwpoison_page(struct page *page) > * for any unsupported type of page in order to reduce the risk of > * unexpected races caused by taking a page refcount. > */ > - if (!HWPoisonHandlable(head)) > + if (!HWPoisonHandlable(head, flags)) > return -EBUSY; > > if (get_page_unless_zero(head)) { > @@ -1222,7 +1228,7 @@ static int get_any_page(struct page *p, unsigned long flags) > > try_again: > if (!count_increased) { > - ret = __get_hwpoison_page(p); > + ret = __get_hwpoison_page(p, flags); > if (!ret) { > if (page_count(p)) { > /* We raced with an allocation, retry. */ > @@ -1250,7 +1256,7 @@ static int get_any_page(struct page *p, unsigned long flags) > } > } > > - if (PageHuge(p) || HWPoisonHandlable(p)) { > + if (PageHuge(p) || HWPoisonHandlable(p, flags)) { > ret = 1; > } else { > /* > @@ -2308,7 +2314,7 @@ int soft_offline_page(unsigned long pfn, int flags) > > retry: > get_online_mems(); > - ret = get_hwpoison_page(page, flags); > + ret = get_hwpoison_page(page, flags | MF_SOFT_OFFLINE); > put_online_mems(); > > if (ret > 0) { > -- > 2.23.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-03-16 8:41 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-12 7:46 [PATCH v2 0/3] A few fixup patches for memory failure Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 1/3] mm/memory-failure.c: fix race with changing page compound again Miaohe Lin 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 1:51 ` Miaohe Lin 2022-03-14 18:20 ` Mike Kravetz 2022-03-15 14:19 ` Miaohe Lin 2022-03-15 18:19 ` Yang Shi 2022-03-16 8:18 ` Miaohe Lin 2022-03-16 8:30 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-16 8:41 ` Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 2/3] mm/memory-failure.c: avoid calling invalidate_inode_page() with unexpected pages Miaohe Lin 2022-03-13 23:41 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 1:58 ` Miaohe Lin 2022-03-14 2:50 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 2:59 ` Miaohe Lin 2022-03-14 23:45 ` Andrew Morton 2022-03-15 13:55 ` Miaohe Lin 2022-03-12 7:46 ` [PATCH v2 3/3] mm/memory-failure.c: make non-LRU movable pages unhandlable Miaohe Lin 2022-03-13 23:43 ` HORIGUCHI NAOYA(堀口 直也) 2022-03-14 17:34 ` Yang Shi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.